Skip to content

Conversation

@ywang96
Copy link
Member

@ywang96 ywang96 commented Nov 23, 2025

Purpose

Reopened from #28287

This PR completely removes the dependency of xformers library and should be only merged after v0.11.1 release. The rationale behind removing xformers is that:

  1. xformers is used for multimodal attention (MHA) but we can have alternative attention backends to replace it
  2. We have xformers attention backend for decoder LM, but it's no longer used for anything
  3. Having another external dependency puts extra risks on our release - a hard lesson we learned from working on upgrading pytorch 2.9.
  4. [Attention] FA2&FA3 support more head sizes, ViT support, make default backend #28763 added FA support for head sizes that we previously did not support, which make xformers no longer necessary.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 23, 2025
@mergify
Copy link

mergify bot commented Nov 23, 2025

Documentation preview: https://vllm--29262.org.readthedocs.build/en/29262/

@mergify mergify bot added documentation Improvements or additions to documentation ci/build qwen Related to Qwen models nvidia labels Nov 23, 2025
@mergify mergify bot added the v1 label Nov 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively deprecates the xformers dependency. The changes are comprehensive, removing xformers from requirements, Dockerfiles, documentation, and tests. The core logic is updated to remove the xformers attention backend, with TORCH_SDPA being used as a fallback in some cases, such as in the keye model. The pixtral model, which still relies on xformers, has been updated with a comment to clarify that xformers is now an optional dependency for that specific model. The changes are clean and well-aligned with the goal of deprecating xformers.

@ywang96
Copy link
Member Author

ywang96 commented Nov 23, 2025

@codex review

@ywang96
Copy link
Member Author

ywang96 commented Nov 23, 2025

Turning on CI to make sure there's no regression.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Nov 23, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 23, 2025 09:20
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great work

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!
The CI error seems related

[2025-11-23T09:50:01Z] __________ ERROR collecting tests/kernels/attention/test_attention.py __________
--
[2025-11-23T09:50:01Z] In test_num_heads_not_divisble_by_num_kv_heads: function uses no argument 'device'

@mgoin
Copy link
Member

mgoin commented Nov 23, 2025

Can you link the PR that expanded FA2 to support the vision encoder support to help explain why we can remove this now?

@DarkLight1337 DarkLight1337 merged commit 0ff7082 into vllm-project:main Nov 24, 2025
90 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Nov 24, 2025
lpapavassiliou pushed a commit to lpapavassiliou/vllm that referenced this pull request Nov 24, 2025
RunkaiTao pushed a commit to RunkaiTao/vllm that referenced this pull request Nov 24, 2025
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Runkai Tao <[email protected]>
bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
oscardev256 added a commit to oscardev256/vllm that referenced this pull request Dec 2, 2025
2. Remove deprecated xformers (vllm-project#29262)
3. Updated _get_prompt_updates()

Signed-off-by: Oscar Gonzalez <[email protected]>
wangxiyuan added a commit to vllm-project/vllm-ascend that referenced this pull request Dec 2, 2025
1. fix vllm-project/vllm#28542
The model structure modifications we involved in are:
     - Qwen2.5-VL(still exist some patch)
     - Qwen2-VL
     - Qwen2
     - DeepSeek series
     - Qwen-moe series
2. fix vllm-project/vllm#29121
   the output token now  type changed from np to `list[list[int]]`

3. fix vllm-project/vllm#29262
    `xformers` backend for multimodal now has been deprecated
4. fix vllm-project/vllm#29342

5. fix vllm-project/vllm#28579
6. fix vllm-project/vllm#28718
7. fix vllm-project/vllm#28665
8. fix vllm-project/vllm#26847
vllm introduced the `optimization-level`, some default config has been
changed, and the param `--enforce-eager` has been deprecated
9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple
for sampler.
10. fix vllm-project/vllm#29471 we'll remove the
related patch to avoid this kind of error.

Co-authored-by: hfadzxy <[email protected]>
Co-authored-by: wangli <[email protected]>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: hfadzxy <[email protected]>
Co-authored-by: wangli <[email protected]>
Co-authored-by: hfadzxy <[email protected]>
ChenCangtao pushed a commit to ChenCangtao/vllm-ascend that referenced this pull request Dec 3, 2025
1. fix vllm-project/vllm#28542
The model structure modifications we involved in are:
     - Qwen2.5-VL(still exist some patch)
     - Qwen2-VL
     - Qwen2
     - DeepSeek series
     - Qwen-moe series
2. fix vllm-project/vllm#29121
   the output token now  type changed from np to `list[list[int]]`

3. fix vllm-project/vllm#29262
    `xformers` backend for multimodal now has been deprecated
4. fix vllm-project/vllm#29342

5. fix vllm-project/vllm#28579
6. fix vllm-project/vllm#28718
7. fix vllm-project/vllm#28665
8. fix vllm-project/vllm#26847
vllm introduced the `optimization-level`, some default config has been
changed, and the param `--enforce-eager` has been deprecated
9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple
for sampler.
10. fix vllm-project/vllm#29471 we'll remove the
related patch to avoid this kind of error.

Co-authored-by: hfadzxy <[email protected]>
Co-authored-by: wangli <[email protected]>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: hfadzxy <[email protected]>
Co-authored-by: wangli <[email protected]>
Co-authored-by: hfadzxy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation nvidia qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants