Skip to content

Conversation

@mmathew23
Copy link
Collaborator

GptAttention is deeper inside modules for_inference and for_training go. To be sure we turn off those modules for inference or on for inference we call model.train and model.eval respectively.

Gpt oss inference now works with use_cache=True

https://colab.research.google.com/drive/11qS1-C86hr8twvo-y_Po4-pYYBRepT_l?usp=sharing

@danielhanchen danielhanchen merged commit cf9a00e into unslothai:main Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants