-
-
Notifications
You must be signed in to change notification settings - Fork 13.1k
[Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 #33637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: chaunceyjiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix an issue with DeepSeek R1 using CUTLASS MLA on B200 GPUs by providing a default value for q_pad_num_heads. However, the current implementation is not correct as the default value is set in a place where it will not be used by the padding logic. The fix needs to be applied in a different location to be effective. I've left a critical comment explaining the issue with the current approach.
Signed-off-by: chaunceyjiang <[email protected]>
|
Hi @chaunceyjiang, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: chaunceyjiang <[email protected]>
|
/cc @MatthewBonanni @LucasWilkinson PTAL. |
MatthewBonanni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! LGTM once comment is addressed
Co-authored-by: Matthew Bonanni <[email protected]> Signed-off-by: Chauncey <[email protected]>
|
Ah actually, I think the current state of the PR will force |
|
I think the proper fix would be:
|
Purpose
FIX #33627
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.