-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Platform] Make forward context manager pluggable for other device #29388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the usage of set_forward_context to be pluggable, allowing different platforms to provide their own forward context manager. This is achieved by introducing a get_forward_context_manager method in the Platform interface. The implementation is sound and improves modularity. My only feedback is on a minor code duplication issue in qwen2_5_vl.py where current_platform is imported locally in two separate methods. I've suggested a refactoring to improve maintainability.
Signed-off-by: shen-shanshan <[email protected]>
| self.language_model.make_empty_intermediate_tensors | ||
| ) | ||
|
|
||
| from vllm.platforms import current_platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need to import this lazily, just import it from top level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need to import this lazily, just import it from top level
done.
vllm/platforms/interface.py
Outdated
| return max_model_len | ||
|
|
||
| @classmethod | ||
| def get_forward_context_manager(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a type hint here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a type hint here?
done.
Signed-off-by: shen-shanshan <[email protected]>
|
After offline discussing with @wangxiyuan , this PR is not needed and we decided to directly refactor our |
…VisionAttention (#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <[email protected]>
…VisionAttention (vllm-project#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <[email protected]>
…VisionAttention (vllm-project#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Che Ruan <[email protected]>
…VisionAttention (vllm-project#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Che Ruan <[email protected]>
Purpose
Currently,
set_forward_contextis directly used in some modeling files, such asQwen2.5-VL(when process image/video input with ViT).We want to use some custom forward context here (without modifying the modeling files in the plugin), such as https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/ascend_forward_context.py#L58.
Thus, this PR has made forward context manager pluggable for other device.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.