[diffusion] use fa3 by default when device supports it by ZJY0516 · Pull Request #783 · vllm-project/vllm-omni

ZJY0516 · 2026-01-14T13:25:27Z

Purpose

I have uploaded a fa3 package in pypi, which only has forward kernel.

Now users can automatically install it.

https://pypi.org/project/fa3-fwd/
https://github.com/ZJY0516/fa3-fwd

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: zjy0516 <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecd0d5378e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-01-14T13:27:32Z

vllm_omni/diffusion/attention/selector.py

+    if os.environ.get("DIFFUSION_ATTENTION_BACKEND") is not None:
+        backend_name = os.environ.get("DIFFUSION_ATTENTION_BACKEND")

    if backend_name is not None:


Initialize backend_name before conditional selection

When neither DIFFUSION_ATTENTION_BACKEND is set nor the CUDA compute capability falls in the 8.x–9.x range, backend_name is never assigned, but it is still referenced in the if backend_name is not None: check. This raises UnboundLocalError and prevents the function from falling back to SDPA on CPU or older GPUs. Initialize backend_name to None (as in the previous version) before the conditional selection to avoid this runtime crash.

Useful? React with 👍 / 👎.

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 · 2026-01-15T04:35:21Z

@wtomin Is there anything I need to do to enable FA by default in parallel mode after #760 merged?

SamitHuang · 2026-01-15T06:15:49Z

vllm_omni/diffusion/attention/backends/flash_attn.py

 # SPDX-FileCopyrightText: Copyright contributors to the vLLM project

 import torch
+from fa3_fwd_interface import flash_attn_func


will it lead to import error on NPUs?

It should handled in selector.py, which means we will not import this file on npu

wtomin · 2026-01-15T06:51:46Z

@wtomin Is there anything I need to do to enable FA by default in parallel mode after #760 merged?

Can you make sure that flash_attn_varlen_func can be supported in FA3? #760 needs this function.

ZJY0516 · 2026-01-15T06:52:57Z

@wtomin Is there anything I need to do to enable FA by default in parallel mode after #760 merged?

Can you make sure that flash_attn_varlen_func can be supported in FA3? #760 needs this function.

yes

wtomin · 2026-01-15T08:11:20Z

vllm_omni/diffusion/attention/selector.py

-    backend_name: str | None = os.environ.get("DIFFUSION_ATTENTION_BACKEND")
+
+    if detect_device_type() == "cuda":
+        compute_capability = torch.cuda.get_device_capability()


I have seen a FA3 support issue in flash_attn on Blackwell devices, such as GB200 chips.
Do you think that we should keep both FA2 and FA3, and select different FA accordingly?

I plan to keep FA2 and FA3. For Blackwell, fall back to sdpa or soething else.

For FA2, I don't have enough bandwidth to build and upload it recently. I'll do this later.

FYI, I found FA3 also support sm80. So I don't plan to maintain fa2 now

Signed-off-by: zjy0516 <[email protected]>

hsliuustc0106 · 2026-01-21T05:12:53Z

it seems the ci failed due to acc

Signed-off-by: zjy0516 <[email protected]>

tjtanaa · 2026-01-21T09:38:35Z

vllm_omni/diffusion/attention/backends/flash_attn.py

 # SPDX-FileCopyrightText: Copyright contributors to the vLLM project

 import torch
+from fa3_fwd_interface import flash_attn_func, flash_attn_varlen_func


@ZJY0516 can we use this helper function from upstream now that we are at v0.14.0?

from vllm.v1.attention.backends.fa_utils import flash_attn_varlen_func

https://github.com/vllm-project/vllm/blob/c80f92c14d5e6c52691f586052af68d1495aac74/vllm/v1/attention/ops/vit_attn_wrappers.py#L38

But upstream don't provide flash_attn_func, so we still need fa3_fwd

use fa3-fwd

ecd0d53

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 requested a review from hsliuustc0106 as a code owner January 14, 2026 13:25

ZJY0516 requested a review from SamitHuang January 14, 2026 13:25

chatgpt-codex-connector bot reviewed Jan 14, 2026

View reviewed changes

ZJY0516 added the ready label to trigger buildkite CI label Jan 15, 2026

ZJY0516 added 2 commits January 15, 2026 11:37

update test

71b3790

Signed-off-by: zjy0516 <[email protected]>

update test

fdb2a26

Signed-off-by: zjy0516 <[email protected]>

SamitHuang reviewed Jan 15, 2026

View reviewed changes

wtomin reviewed Jan 15, 2026

View reviewed changes

wtomin mentioned this pull request Jan 15, 2026

[Feature] Flash Attention to Support Attention Mask #760

Merged

5 tasks

david6666666 added this to the v0.14.0rc1 milestone Jan 16, 2026

wtomin mentioned this pull request Jan 16, 2026

[RFC]: Diffusion Models Features Supports Plan #814

Open

53 tasks

ZJY0516 added 4 commits January 20, 2026 22:59

Merge branch 'main' into fa

2b00f26

Signed-off-by: zjy0516 <[email protected]>

update

5503ba6

Signed-off-by: zjy0516 <[email protected]>

Merge branch 'main' into fa

0e7f256

Signed-off-by: zjy0516 <[email protected]>

update

0fb4c30

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 force-pushed the fa branch from 7399dc6 to 0fb4c30 Compare January 21, 2026 02:36

ZJY0516 added 4 commits January 21, 2026 13:20

update

8032678

Signed-off-by: zjy0516 <[email protected]>

Merge branch 'main' into fa

1050ab6

update

889426e

Signed-off-by: zjy0516 <[email protected]>

update for rocm

3211ebd

Signed-off-by: zjy0516 <[email protected]>

tjtanaa reviewed Jan 21, 2026

View reviewed changes

SamitHuang approved these changes Jan 21, 2026

View reviewed changes

SamitHuang merged commit 0df8e80 into vllm-project:main Jan 22, 2026
7 checks passed

ZJY0516 deleted the fa branch January 23, 2026 15:07

Conversation

ZJY0516 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 commented Jan 15, 2026

Uh oh!

SamitHuang Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

wtomin commented Jan 15, 2026

Uh oh!

ZJY0516 commented Jan 15, 2026

Uh oh!

wtomin Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Jan 21, 2026

Uh oh!

tjtanaa Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ZJY0516 commented Jan 14, 2026 •

edited

Loading