fix: guard flash-attn rotary import by he-yufeng · Pull Request #42679 · vllm-project/vllm

he-yufeng · 2026-05-14T20:57:45Z

Summary

import flash_attn.ops.triton.rotary directly when vLLM is not on CPU
treat a missing rotary module as unavailable instead of crashing during ApplyRotaryEmb construction
keep the existing fallback path when FA4 no longer provides the old rotary module

FA4 can leave the flash_attn root package importable while moving or removing flash_attn.ops.triton.rotary. Checking only the root package therefore selects a fast path that fails during import. The branch now follows the maintainer-requested direct import shape and catches ModuleNotFoundError around that exact import.

Duplicate work check

I searched open, closed, and recently merged vLLM PRs for #42675, flash_attn.ops.triton.rotary, ApplyRotaryEmb, and the FA4 import failure before opening and updating this PR. I did not find another active fix for this path.

AI assistance disclosure

I used OpenAI Codex to help inspect the repository, update the implementation after review, and run local validation. I reviewed the final diff and can explain the failure mode and fallback behavior. The commits include the required Assisted-by attribution trailers.

Reproduce on `main`

With an FA4 environment where flash_attn exists but flash_attn.ops.triton.rotary does not, run:

vllm bench latency \
  --model=Qwen/Qwen3-1.7B \
  --quantization=fp8 \
  --attention-backend=TRITON_ATTN \
  --input-len=1024 \
  --output-len=128 \
  --batch-size=8

main imports the missing rotary module after probing only the root flash_attn package, which raises ModuleNotFoundError. On this branch, the direct rotary import is guarded and the existing fallback remains active.

Validation

Passed locally on Windows:

python -m py_compile vllm/model_executor/layers/rotary_embedding/common.py
python -m ruff check vllm/model_executor/layers/rotary_embedding/common.py
python -m ruff format --check vllm/model_executor/layers/rotary_embedding/common.py
git diff --check

The previous branch head also passed the vLLM Buildkite PR suite. I do not have the required FA4/GPU environment to rerun the full latency reproduction locally.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request updates the ApplyRotaryEmb class to check for the specific submodule flash_attn.ops.triton.rotary instead of just the base flash_attn package before attempting to import it. This ensures that the rotary embedding logic correctly handles environments where the base package is present but the required Triton operations are missing. Additionally, a unit test was added to verify this behavior using mocks. I have no feedback to provide as there were no review comments.

yewentao256 · 2026-05-14T21:28:48Z

+def test_apply_rotary_emb_skips_flash_attn_without_rotary_module(monkeypatch):
+    monkeypatch.setattr(rotary_common.current_platform, "is_cpu", lambda: False)
+
+    def fake_find_spec(name):
+        if name == "flash_attn":
+            return object()
+        if name == "flash_attn.ops.triton.rotary":
+            return None
+        raise AssertionError(f"unexpected import probe: {name}")
+
+    monkeypatch.setattr(rotary_common, "find_spec", fake_find_spec)
+
+    compilation_config = CompilationConfig(custom_ops=["all"])
+    monkeypatch.setattr(
+        custom_op_module,
+        "get_cached_compilation_config",
+        lambda: compilation_config,
+    )
+
+    op = rotary_common.ApplyRotaryEmb()
+
+    assert op.apply_rotary_emb_flash_attn is None
+
+


Suggested change

def test_apply_rotary_emb_skips_flash_attn_without_rotary_module(monkeypatch):

monkeypatch.setattr(rotary_common.current_platform, "is_cpu", lambda: False)

def fake_find_spec(name):

if name == "flash_attn":

return object()

if name == "flash_attn.ops.triton.rotary":

return None

raise AssertionError(f"unexpected import probe: {name}")

monkeypatch.setattr(rotary_common, "find_spec", fake_find_spec)

compilation_config = CompilationConfig(custom_ops=["all"])

monkeypatch.setattr(

custom_op_module,

"get_cached_compilation_config",

lambda: compilation_config,

)

op = rotary_common.ApplyRotaryEmb()

assert op.apply_rotary_emb_flash_attn is None

Thanks for the work! For this small update, we don't need a specific unit test

joonyoo181 · 2026-06-01T21:43:48Z

@he-yufeng are you able to finish this? otherwise, i can take over

he-yufeng · 2026-06-01T22:15:30Z

I pushed d42c6dc and removed the dedicated unit test per review, keeping the import guard focused on the production change. Local validation: py_compile, ruff check, ruff format --check, and git diff --check. CI is running now.

yewentao256

LGTM, thanks for the work!

Please also add the full reproduce command in main and error report in PR description, then the unit tests pass in this branch

he-yufeng · 2026-06-04T03:33:52Z

Rebased the branch onto the latest upstream/main and kept the PR scoped to the production import guard only.

Validation run locally:

python -m py_compile vllm/model_executor/layers/rotary_embedding/common.py: passed
python -m ruff check vllm/model_executor/layers/rotary_embedding/common.py: passed
python -m ruff format --check vllm/model_executor/layers/rotary_embedding/common.py: passed
git diff --check upstream/main..HEAD: passed

he-yufeng · 2026-06-04T06:51:28Z

Pushed ed20d62ee to fix the Buildkite failure on this branch.

Root cause from the failed Buildkite logs:

ModuleNotFoundError: No module named 'flash_attn'
...
find_spec("flash_attn.ops.triton.rotary")

importlib.util.find_spec() can raise ModuleNotFoundError when the parent package is absent. The branch now wraps that spec probe in _has_flash_attn_rotary() and treats a missing parent package as "rotary module unavailable", while still leaving the actual apply_rotary import explicit when the spec exists.

Validation run locally:

python -m py_compile vllm/model_executor/layers/rotary_embedding/common.py
python -m ruff check vllm/model_executor/layers/rotary_embedding/common.py
python -m ruff format --check vllm/model_executor/layers/rotary_embedding/common.py
git diff --check

I also reproduced the failing condition with the same helper logic in a local Python snippet: missing flash_attn now returns False instead of raising.

yewentao256 · 2026-06-04T15:43:33Z


        self.apply_rotary_emb_flash_attn = None
-        if not current_platform.is_cpu() and find_spec("flash_attn") is not None:
+        if not current_platform.is_cpu() and _has_flash_attn_rotary():


Suggested change

if not current_platform.is_cpu() and _has_flash_attn_rotary():

if not current_platform.is_cpu() and find_spec("flash_attn.ops.triton.rotary") is not None():

Why can't we do like this?

find_spec("flash_attn.ops.triton.rotary") is not None was the first shape I tried, but it still raises when the parent package is absent.

The Buildkite failure on the previous head was exactly that case:

ModuleNotFoundError: No module named 'flash_attn' ... find_spec("flash_attn.ops.triton.rotary")

find_spec() only returns None when the searched module is absent under an importable parent package. If flash_attn itself is not installed, it raises ModuleNotFoundError while resolving the parent. The helper keeps the production guard to the same check, but catches that missing-parent case and treats it as unavailable.

I see, please also add the full reproduce command in main and error report in PR description, then the same command pass in this branch, I will take a look

_

he-yufeng · 2026-06-07T02:04:03Z

Updated the PR description with the requested main-branch repro command, error text, branch behavior, and current validation/CI status.

yewentao256

Thanks @he-yufeng
Seems your issue has been fixed in main, or there is an issue with your current env

[yewentao256@nm-frk-h200-03-preserve vllm-source]$ python - <<'PY'
from vllm.model_executor.layers.rotary_embedding.common import ApplyRotaryEmb

ApplyRotaryEmb(
    rotary_dim=128,
    neox_style=True,
    head_size=128,
    is_segmented=False,
)
PY
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
TypeError: ApplyRotaryEmb.__init__() got an unexpected keyword argument 'rotary_dim'

The error is from your command line instead of the missing of import.

yewentao256 · 2026-06-07T19:08:12Z

Close this PR as not scheduled. Feel free to reopen if I am wrong

joonyoo181 · 2026-06-08T18:28:58Z

@yewentao256 i dont think main fixes the issue. the issue here is with calling find_spec("flash_attn") to check if flash_attn.ops.triton.rotary module exists. With FA4, flash_attn exists but flash_attn.ops.triton.rotary was moved elsewhere, so the code causes import error. You need to have FA4 installed on your environment in order for the command to trigger the import error.

check #42675

yewentao256 · 2026-06-08T19:17:40Z

@joonyoo181 Please update FA4 to newer version like

pip show flash-attn-4
Name: flash-attn-4
Version: 4.0.0b16
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: 
License: BSD 3-Clause License
Location: /home/yewentao256/.venv/lib/python3.12/site-packages
Requires: apache-tvm-ffi, einops, nvidia-cutlass-dsl, quack-kernels, torch, torch-c-dlpack-ext, typing_extensions
Required-by:

Or uninstall it, vLLM has its own FA compilation.

yewentao256 · 2026-06-08T20:58:29Z

Opps, this can be reproduce by vllm bench latency --model=Qwen/Qwen3-1.7B --quantization=fp8 --attention-backend=TRITON_ATTN --input-len=1024 --output-len=128 --batch-size=8

yewentao256 · 2026-06-08T21:04:41Z

@@ -135,7 +142,7 @@ def __init__(
        self.enable_fp32_compute = enable_fp32_compute

        self.apply_rotary_emb_flash_attn = None
-        if not current_platform.is_cpu() and find_spec("flash_attn") is not None:
+        if not current_platform.is_cpu() and _has_flash_attn_rotary():


Suggested change

if not current_platform.is_cpu() and _has_flash_attn_rotary():

if not current_platform.is_cpu():

try:

apply_rotary = import_module(

"flash_attn.ops.triton.rotary"

).apply_rotary

except ModuleNotFoundError:

apply_rotary = None

I'd prefer this fix

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Assisted-by: OpenAI Codex

he-yufeng · 2026-06-09T10:25:42Z

Implemented the direct-import approach in f53d644. The code now attempts import_module("flash_attn.ops.triton.rotary").apply_rotary only on non-CPU platforms and leaves the existing fallback selected when that exact module is missing. I used contextlib.suppress(ModuleNotFoundError) because the repository's ruff configuration rejects a bare try/except/pass with SIM105.

I also added the repository-required AI assistance disclosure and Assisted-by commit trailers. Local py_compile, ruff check, ruff format check, and diff check pass.

yewentao256

LGTM, thanks for the work and iterations!

he-yufeng · 2026-06-10T13:21:34Z

Checked the remaining red Buildkite shard. The failing job is multi-modal-models-standard-1-qwen2, and the root error is in Qwen/Qwen2.5-VL-3B-Instruct processor loading:

OSError: Can't load image processor for 'Qwen/Qwen2.5-VL-3B-Instruct'

The failure is in tests/models/multimodal/generation/test_common.py::test_multi_image_models[qwen2_5_vl-test_case47], with repeated Insufficient Permissions warnings while detecting available memory. The CPU/basic/model-executor/rotary-adjacent shards are green, and this does not look related to the rotary_embedding/common.py fallback change in this PR.

Could someone rerun that Buildkite shard or the PR build?

yewentao256

Let's retry once before force merge

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

he-yufeng requested review from WoosukKwon, mgoin, tlrmchlsmth, yewentao256 and zyongye as code owners May 14, 2026 20:57

claude Bot reviewed May 14, 2026

View reviewed changes

he-yufeng force-pushed the fix/flash-attn-rotary-spec branch from eff8da3 to d53627d Compare May 14, 2026 20:58

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

yewentao256 reviewed May 14, 2026

View reviewed changes

Lidang-Jiang mentioned this pull request May 23, 2026

[Bugfix] Fix ImportError for flash_attn < v2.1.2 missing triton rotary module #38091

Closed

5 tasks

he-yufeng force-pushed the fix/flash-attn-rotary-spec branch from d53627d to d42c6dc Compare June 1, 2026 22:14

yewentao256 previously approved these changes Jun 3, 2026

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 3, 2026

he-yufeng force-pushed the fix/flash-attn-rotary-spec branch from d42c6dc to 1114322 Compare June 4, 2026 03:33

he-yufeng force-pushed the fix/flash-attn-rotary-spec branch from 1114322 to ed20d62 Compare June 4, 2026 06:51

yewentao256 reviewed Jun 4, 2026

View reviewed changes

yewentao256 reviewed Jun 7, 2026

View reviewed changes

yewentao256 closed this Jun 7, 2026

yewentao256 reopened this Jun 8, 2026

yewentao256 reviewed Jun 8, 2026

View reviewed changes

he-yufeng added 2 commits June 9, 2026 18:23

fix: guard flash-attn rotary import

3652de6

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Assisted-by: OpenAI Codex

fix: import flash-attn rotary defensively

f53d644

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Assisted-by: OpenAI Codex

he-yufeng force-pushed the fix/flash-attn-rotary-spec branch from ed20d62 to f53d644 Compare June 9, 2026 10:23

yewentao256 approved these changes Jun 9, 2026

View reviewed changes

yewentao256 reviewed Jun 10, 2026

View reviewed changes

Merge branch 'main' into fix/flash-attn-rotary-spec

365c6db

yewentao256 merged commit 1f60771 into vllm-project:main Jun 11, 2026
72 checks passed

	if not current_platform.is_cpu() and _has_flash_attn_rotary():
	if not current_platform.is_cpu() and find_spec("flash_attn.ops.triton.rotary") is not None():

-        if not current_platform.is_cpu() and _has_flash_attn_rotary():
+        if not current_platform.is_cpu():
+            try:
+                apply_rotary = import_module(
+                    "flash_attn.ops.triton.rotary"
+                ).apply_rotary
+            except ModuleNotFoundError:
+                apply_rotary = None

Uh oh!

Conversation

he-yufeng commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Duplicate work check

AI assistance disclosure

Reproduce on main

Validation

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yewentao256 May 14, 2026

Choose a reason for hiding this comment

Uh oh!

joonyoo181 commented Jun 1, 2026

Uh oh!

he-yufeng commented Jun 1, 2026

Uh oh!

yewentao256 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented Jun 4, 2026

Uh oh!

he-yufeng commented Jun 4, 2026

Uh oh!

yewentao256 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

he-yufeng Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented Jun 7, 2026

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

yewentao256 commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joonyoo181 commented Jun 8, 2026

Uh oh!

yewentao256 commented Jun 8, 2026

Uh oh!

yewentao256 commented Jun 8, 2026

Uh oh!

yewentao256 Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented Jun 9, 2026

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented Jun 10, 2026

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

he-yufeng commented May 14, 2026 •

edited

Loading

Reproduce on `main`

yewentao256 left a comment •

edited

Loading

he-yufeng Jun 4, 2026 •

edited

Loading

yewentao256 commented Jun 7, 2026 •

edited

Loading