Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions vllm/model_executor/models/qwen2_5_vl.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,19 @@
self.use_upstream_fa = True
if current_platform.is_xpu():
self.use_upstream_fa = False
# Flash attention requires head_dim to be a multiple of 32
# Fall back to TORCH_SDPA if the head dimension is incompatible
if self.attn_backend in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ECMGit can you skip this check on rocm for now? We have different conditions.

AttentionBackendEnum.FLASH_ATTN,
AttentionBackendEnum.ROCM_AITER_FA,
} and self.hidden_size_per_attention_head % 32 != 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #28763 has added head_size=80 (used by cosmos-7b's ViT) support to FA. Can you try to install the latest nightly wheel?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have verify this bug does not occur in latest nightly build, verify to close this PR and issue.

logger.warning(
f"Flash attention backend requires head_dim to be a multiple of 32, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix pre-commit

f"but got {self.hidden_size_per_attention_head}. "
f"Falling back to TORCH_SDPA backend."

Check failure on line 374 in vllm/model_executor/models/qwen2_5_vl.py

View workflow job for this annotation

GitHub Actions / pre-commit

Ruff (G004)

vllm/model_executor/models/qwen2_5_vl.py:372:17: G004 Logging statement uses f-string
)
self.attn_backend = AttentionBackendEnum.TORCH_SDPA
self.flash_attn_varlen_func = None
self.is_flash_attn_backend = self.attn_backend in {
AttentionBackendEnum.FLASH_ATTN,
AttentionBackendEnum.ROCM_AITER_FA,
Expand Down
Loading