GptAttention turn training off during inference #3289

mmathew23 · 2025-09-08T20:37:35Z

GptAttention is deeper inside modules for_inference and for_training go. To be sure we turn off those modules for inference or on for inference we call model.train and model.eval respectively.

Gpt oss inference now works with use_cache=True

https://colab.research.google.com/drive/11qS1-C86hr8twvo-y_Po4-pYYBRepT_l?usp=sharing

GptAttention turn training off during inference

3e2c067

danielhanchen merged commit cf9a00e into unslothai:main Sep 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GptAttention turn training off during inference #3289

GptAttention turn training off during inference #3289

Uh oh!

mmathew23 commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

GptAttention turn training off during inference #3289

GptAttention turn training off during inference #3289

Uh oh!

Conversation

mmathew23 commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants