-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Core] Deprecate xformers
#29262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Deprecate xformers
#29262
Conversation
Signed-off-by: Roger Wang <[email protected]>
|
Documentation preview: https://vllm--29262.org.readthedocs.build/en/29262/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively deprecates the xformers dependency. The changes are comprehensive, removing xformers from requirements, Dockerfiles, documentation, and tests. The core logic is updated to remove the xformers attention backend, with TORCH_SDPA being used as a fallback in some cases, such as in the keye model. The pixtral model, which still relies on xformers, has been updated with a comment to clarify that xformers is now an optional dependency for that specific model. The changes are clean and well-aligned with the goal of deprecating xformers.
|
@codex review |
|
Turning on CI to make sure there's no regression. |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if tests pass
jeejeelee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this great work
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
The CI error seems related
[2025-11-23T09:50:01Z] __________ ERROR collecting tests/kernels/attention/test_attention.py __________
--
[2025-11-23T09:50:01Z] In test_num_heads_not_divisble_by_num_kv_heads: function uses no argument 'device'|
Can you link the PR that expanded FA2 to support the vision encoder support to help explain why we can remove this now? |
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Runkai Tao <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
2. Remove deprecated xformers (vllm-project#29262) 3. Updated _get_prompt_updates() Signed-off-by: Oscar Gonzalez <[email protected]>
1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <[email protected]> Co-authored-by: wangli <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: wangli <[email protected]> Signed-off-by: hfadzxy <[email protected]> Co-authored-by: wangli <[email protected]> Co-authored-by: hfadzxy <[email protected]>
1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <[email protected]> Co-authored-by: wangli <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: wangli <[email protected]> Signed-off-by: hfadzxy <[email protected]> Co-authored-by: wangli <[email protected]> Co-authored-by: hfadzxy <[email protected]>
Purpose
Reopened from #28287
This PR completely removes the dependency of
xformerslibrary and should be only merged after v0.11.1 release. The rationale behind removingxformersis that:xformersis used for multimodal attention (MHA) but we can have alternative attention backends to replace itxformersattention backend for decoder LM, but it's no longer used for anythingTest Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.