-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Fix num tokens when sp moe enabled #25828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Wuxun Zhang <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a bug in the GraniteMoeMoE layer where num_tokens was miscalculated when sequence parallelism is enabled with 3D input tensors. The change ensures num_tokens is derived after flattening the hidden_states tensor, which is the correct approach for handling inputs of various dimensions. The fix is sound and resolves the issue described. I have not found any high or critical severity issues in the proposed changes.
|
@tlrmchlsmth Hi, could you please take a look at this? I am not able to trigger CI. |
|
@tlrmchlsmth please take a further review, thanks! |
|
Update: for Gaudi, we are going to follow same assumption (2D hidden states) as community. Currently we have a flag for this. So this will be not hard requirement. But, I think it's still valid fix for 1D hidden states (as mentioned in code comment), though not sure what scenario 1D hidden states for. |
After vllm-project/vllm#24982 merged, sequence parallel MOE will be turned on when `enable_expert_parallel=True`, `tp_size > 1` and `dp_size > 1`. Since for Gaudi, there is no choice for `VLLM_ALL2ALL_BACKEND`, we can not easily bypass it. So this PR aims to support the feature. ```python class ParallelConfig: @Property def use_sequence_parallel_moe(self) -> bool: return (envs.VLLM_ALL2ALL_BACKEND in ("allgather_reducescatter", "naive", "deepep_high_throughput", "deepep_low_latency") and self.enable_expert_parallel and self.tensor_parallel_size > 1 and self.data_parallel_size > 1) ``` Update: No hard requirement on vllm-project/vllm#25828 --------- Signed-off-by: Wuxun Zhang <[email protected]>
After vllm-project/vllm#24982 merged, sequence parallel MOE will be turned on when `enable_expert_parallel=True`, `tp_size > 1` and `dp_size > 1`. Since for Gaudi, there is no choice for `VLLM_ALL2ALL_BACKEND`, we can not easily bypass it. So this PR aims to support the feature. ```python class ParallelConfig: @Property def use_sequence_parallel_moe(self) -> bool: return (envs.VLLM_ALL2ALL_BACKEND in ("allgather_reducescatter", "naive", "deepep_high_throughput", "deepep_low_latency") and self.enable_expert_parallel and self.tensor_parallel_size > 1 and self.data_parallel_size > 1) ``` Update: No hard requirement on vllm-project/vllm#25828 --------- Signed-off-by: Wuxun Zhang <[email protected]> Signed-off-by: Iryna Boiko <[email protected]>
For some backends like Gaudi, the input shape of first MOE layer could be 3D (bs, seqlen, hdim).
So here when SP MOE enabled (#24982),
num_tokensmay got wrong value.@tlrmchlsmth
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.