-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode #28935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a potential TypeError when splitting_ops is None. The added check prevents a crash. I've suggested a small improvement to make the code more concise and idiomatic, which enhances readability and maintainability.
|
@angelayi Could you please review this? Is this change appropriate since splitting_ops might be None? |
|
Note: |
angelayi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you take a look the original cause, namely, why there is a None in the context and fix that?
vllm/config/vllm.py
Outdated
| is_fullgraph = ( | ||
| self.compilation_config.use_inductor_graph_partition | ||
| or len(self.compilation_config.splitting_ops) == 0 | ||
| or self.compilation_config.splitting_ops == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right above this line, we call self.compilation_config.set_splitting_ops_for_v1() which sets the splitting ops to a non-None value. What case are you running that has the SP pass enabled but compilation mode != VLLM_COMPILE? That's not a supported case.
I do agree we should handle invalid configuration gracefully. If you want to fix this, can you move the call to set_splitting_ops_for_v1 outside of the if, and just set to empty list if mode != VLLM_COMPILE? and then if enable_sequence_parallelism is True but mode is wrong, add a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if enalbe sp with eager mode, the compile mode is None
Lines 440 to 444 in da94c7c
| if self.compilation_config.mode is None: | |
| if self.model_config is not None and not self.model_config.enforce_eager: | |
| self.compilation_config.mode = CompilationMode.VLLM_COMPILE | |
| else: | |
| self.compilation_config.mode = CompilationMode.NONE |
I know that sp should work with graph mode in vLLM. But for some platform, such as vllm-ascend, it doesn't work with graph pass currently, so it implements sp with another way to let sp work with eager mode. The e2e test is:
https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/multicard/test_offline_inference_distributed.py#L169
Future plan:
We're working on custom graph pass feature #28623 Once it's done, we'll implement sp with the same way with vLLM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay thanks for the clarification. Can you update set_splitting_ops_for_v1 so that splitting ops aren't empty during enforce eager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Facing the same issue. @wangxiyuan Gentle ping if there's any update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leo-pony please update the PR ASAP, thanks.
… Eager mode Signed-off-by: leo-pony <[email protected]>
…able SP in Eager mode (vllm-project#28935) Signed-off-by: leo-pony <[email protected]> Signed-off-by: Hashem Hashemi <[email protected]>
…able SP in Eager mode (vllm-project#28935) Signed-off-by: leo-pony <[email protected]> Signed-off-by: Bofeng BF1 Xue <[email protected]>
…able SP in Eager mode (vllm-project#28935) Signed-off-by: leo-pony <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>
Before get lens of splitting_ops should first check it is not None; otherwise, in piecewise mode without set splitting_ops error will occur:
The pull request for this issue is here: #27126
For the value of splitting_ops: If None, defaults to attention ops for piecewise cudagraphs , and If empty list [], no ops are excluded (suitable for full cudagraphs). Detail see comments from vllm/config/compilation.py:
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.