-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Bugfix] Fix block size in block_table with PCP #29094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Livinfly <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a bug in the block_table by factoring in pcp_world_size for block size calculations, which is essential for correct behavior with Prefill Context Parallelism. The logic change is sound. I have included one review comment suggesting a refactoring to address code duplication for fetching distributed world sizes. This will improve maintainability and prevent potential inconsistencies in the future.
|
Yes, we did overlook the logic modifications for MultiGroupBlockTable. Thank you very much for fixing this. |
pisceskkk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fix!
Signed-off-by: Livinfly <[email protected]>
Signed-off-by: Livinfly <[email protected]>
Signed-off-by: Livinfly <[email protected]> Signed-off-by: Runkai Tao <[email protected]>
Signed-off-by: Livinfly <[email protected]>
Signed-off-by: Livinfly <[email protected]>
Signed-off-by: Livinfly <[email protected]>
I discovered a potential issue regarding the missing logic of the block_table in #28718.
In the PCP rank, it should also store only one copy, so the parameter passed from
MultiGroupBlockTabletoBlockTableshould likely be multiplied bycp_world_size. The original code seems to be missing thePCP_world_sizefactor.Could you please verify this, or am I mistaken?
@pisceskkk
Purpose
fix block_table
Test Plan
NA
Test Result
NA
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.