[BugFix] Cosmos-Reason1-7B Model Flash Attention requires head dim to be a multiple of 32 #29615

ECMGit · 2025-11-27T15:13:38Z

Purpose

Fix the issue reported in #29417, Flash attention backend requires head_dim to be a multiple of 32 in vllm/model_executor/models/qwen2_5_vl.py , Fall back to TORCH_SDPA backend as a workaround to fix this issue.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a fix for a crash that occurs when using FlashAttention with a head dimension that is not a multiple of 32 in the Qwen2.5-VL vision encoder. The change correctly detects this condition and falls back to the TORCH_SDPA backend, which is a robust solution to prevent the runtime error. The implementation is clear and correctly placed within the Qwen2_5_VisionAttention module's initialization. The fix is well-contained and effectively resolves the reported issue.

DarkLight1337 · 2025-11-27T15:19:37Z

vllm/model_executor/models/qwen2_5_vl.py

+            AttentionBackendEnum.ROCM_AITER_FA,
+        } and self.hidden_size_per_attention_head % 32 != 0:
+            logger.warning(
+                f"Flash attention backend requires head_dim to be a multiple of 32, "


Please fix pre-commit

DarkLight1337 · 2025-11-27T15:20:15Z

cc @tjtanaa @Isotr0py @ywang96 so you're aware of this

Isotr0py · 2025-11-27T15:50:20Z

vllm/model_executor/models/qwen2_5_vl.py

+        if self.attn_backend in {
+            AttentionBackendEnum.FLASH_ATTN,
+            AttentionBackendEnum.ROCM_AITER_FA,
+        } and self.hidden_size_per_attention_head % 32 != 0:


I think #28763 has added head_size=80 (used by cosmos-7b's ViT) support to FA. Can you try to install the latest nightly wheel?

Thanks! I have verify this bug does not occur in latest nightly build, verify to close this PR and issue.

tjtanaa · 2025-11-28T03:42:39Z

vllm/model_executor/models/qwen2_5_vl.py

            self.use_upstream_fa = False
+        # Flash attention requires head_dim to be a multiple of 32
+        # Fall back to TORCH_SDPA if the head dimension is incompatible
+        if self.attn_backend in {


@ECMGit can you skip this check on rocm for now? We have different conditions.

LucasWilkinson

+1 to this: https://github.com/vllm-project/vllm/pull/29615/files#r2569351681 it FA should support multiples of 8 now

mergify · 2025-11-28T18:44:27Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ECMGit.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

fall back to TORCH_SDPA

d486679

ECMGit requested a review from sighingnow as a code owner November 27, 2025 15:13

mergify bot added the qwen Related to Qwen models label Nov 27, 2025

ECMGit changed the title ~~[BugFix] Cosmos-Reason1-7B Model Flash Attention head dim bug, fall back to TORCH_SDPA backend~~ [BugFix] Cosmos-Reason1-7B Model Flash Attention requires head dim to be a multiple of 32 Nov 27, 2025

gemini-code-assist bot reviewed Nov 27, 2025

View reviewed changes

DarkLight1337 reviewed Nov 27, 2025

View reviewed changes

ECMGit mentioned this pull request Nov 27, 2025

[Bug]: Cosmos-Reason-7B Flash Attention head dim #29417

Open

1 task

Isotr0py reviewed Nov 27, 2025

View reviewed changes

tjtanaa reviewed Nov 28, 2025

View reviewed changes

LucasWilkinson requested changes Nov 28, 2025

View reviewed changes

mergify bot added the needs-rebase label Nov 28, 2025

DarkLight1337 closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Cosmos-Reason1-7B Model Flash Attention requires head dim to be a multiple of 32 #29615

[BugFix] Cosmos-Reason1-7B Model Flash Attention requires head dim to be a multiple of 32 #29615

Uh oh!

ECMGit commented Nov 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 Nov 27, 2025

Uh oh!

DarkLight1337 commented Nov 27, 2025

Uh oh!

Isotr0py Nov 27, 2025

Uh oh!

ECMGit Dec 1, 2025

Uh oh!

tjtanaa Nov 28, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

mergify bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[BugFix] Cosmos-Reason1-7B Model Flash Attention requires head dim to be a multiple of 32 #29615

[BugFix] Cosmos-Reason1-7B Model Flash Attention requires head dim to be a multiple of 32 #29615

Uh oh!

Conversation

ECMGit commented Nov 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Nov 27, 2025

Uh oh!

Isotr0py Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

ECMGit Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

tjtanaa Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ECMGit commented Nov 27, 2025 •

edited by github-actions bot

Loading