Skip to content

Conversation

@kandelak
Copy link

@kandelak kandelak commented May 16, 2025

There is a bug in calculation of the attention if fused_attn is set to false.

To replicate this bug, set fused_attn to false and you will get very poor reconstruction whereas if it is set to true (default), it works. With this change, it works again (reimplemented non-efficient fused_attention basically)

Possible reason: The training was done using F.scaled_dot_product_attention which is internally different from the "else branch" where attention calculation happens in a non-efficient way.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 16, 2025
@kandelak
Copy link
Author

To remind you. This problem came up also for other users (Here for instance: #149) and this PR solves it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants