[Diffusion]: Diffusion Ulysses-Sequence-Parallelism support by wtomin · Pull Request #189 · vllm-project/vllm-omni

wtomin · 2025-12-04T06:25:08Z

This PR allows user to set Ulysses Attention for diffusion model, e.g., qwen-image and qwen-image-edit. Currently only tested it with SPDA attention on H800 GPUs.

Purpose

To support various parallelism inference algorithms, this PR introduce:

DiffusionParallelConfig in vllm_omni/diffusion/data.py: Configuration for diffusion model distributed execution.
vllm_omni/diffusion/distributed: handle the communication groups of different parallel configuration.
tests/diffusion/attention/test_ulysses_sequence_parallel.py‎: UT for ulysses attention and multi-layer ulysses attention.

This PR also edits:

vllm_omni/diffusion/attention/layer.py: allow Attention to accept ulysses attention kwargs and support ulysses attention in forward function;
‎vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py: chunked hidden_states and image position embedding;
‎vllm_omni/diffusion/worker/gpu_worker.py‎: replace vllm's init_distributed_environment and initialize_model_parallel by vllm_omni's equivalents.

Test Plan

UTs:

fast ut: pytest tests/diffusion/attention/test_ulysses_sequence_parallel.py
fast ut: pytest tests/diffusion/distributed/test_comm.py
slow ut: test a small random diffusion model tests/e2e/offline_inference/test_sequence_parallel.py

T2I inference:
python examples/offline_inference/text_to_image/text_to_image.py --ulysses_degree 2‎

Test Result

fast UT
all passed.
T2I inference
- attention backends: spda
- GPU: H800
- image resolution (1024x1024)

num_gpus	ulysses degree	sec/img	generated image	script
1	-	19.14		python examples/offline_inference/text_to_image/text_to_image.py
2	2	16.08		python examples/offline_inference/text_to_image/text_to_image.py --ulysses_degree 2‎

I tried to test ulysses attention with diffusers ContextParallelConfig(ulysses_degree=2) on qwen-image, but got an error. Refer to #huggingface/diffusers#12568. Difffusers commuity is working on solving it.

To measure the parallelism methods, we run benchmarks with Qwen/Qwen-Image model generating images (2048x2048 as long sequence input) with 50 inference steps. The hardware devices are NVIDIA H800 GPUs. sdpa is the attention backends.

Configuration	Ulysses degree	Generation Time	Speedup
Baseline (diffusers)	-	112.5s	1.0x
Ulysses-SP	2	65.2s	1.73x
Ulysses-SP	4	39.6s	2.84x
Ulysses-SP	8	30.8s	3.65x

Discussion

Is it necessary to move vllm_omni/diffusion/distributed to vllm_omni/distributed in this PR?

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/diffusion/data.py

vllm_omni/diffusion/distributed/parallel_state.py

examples/offline_inference/qwen_image/text_to_image_sp.py

tests/diffusion/attention/test_ulysses_sequence_parallel.py

SamitHuang · 2025-12-11T02:04:46Z

nice work. is the parallel speedup ratio normal compared to diffusers' ulysses sp?

congw729 · 2025-12-11T02:30:10Z

The DOC check has failed, pls resolve all the warnings locally.

Local Documentation Build

pip install -e ".[docs]"
mkdocs build
mkdocs serve

First, make sure there is no warning in the logging messages.
Then open http://127.0.0.1:8000/ in a browser to verify that all navigation links and content display properly

tests/diffusion/attention/test_ulysses_sequence_parallel.py

vllm_omni/diffusion/distributed/comm.py

vllm_omni/diffusion/attention/layer.py

vllm_omni/diffusion/distributed/comm.py

vllm_omni/diffusion/distributed/group_coordinator.py

vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py

ZJY0516 · 2025-12-11T03:14:10Z

vllm_omni/diffusion/worker/gpu_worker.py

+        vllm_config.parallel_config.data_parallel_size = self.od_config.parallel_config.data_parallel_size
+
+        with set_current_omni_diffusion_config(self.od_config):
+            with set_current_vllm_config(vllm_config):


Why we still need vllm_config since we have our own init function?

Because VllmConfig is set in set_current_vllm_config, and set_current_omni_diffusion_config only set OmniDiffusionConfig. Any suggestions?

wtomin · 2025-12-12T11:02:57Z

nice work. is the parallel speedup ratio normal compared to diffusers' ulysses sp?

Unfortunately, diffusers' ulysses sp on qwen-image has an error. #huggingface/diffusers#12568. Still working in progress.

hsliuustc0106 · 2025-12-13T00:23:34Z

@gcanlin PTAL

gcanlin · 2025-12-13T03:15:46Z

Nice work! Would be better if NPUWoker can be applied with the same changes. But it’s fine to modify only the GPUWorker. I’m thinking of refactoring the common logic between NPU and GPU into a base worker abstraction in the following PR, which should help reduce duplication in future updates :)
I’ll test the performance gains on NPU first using this PR.

Signed-off-by: Didan Deng <[email protected]>

hsliuustc0106

lgtm

…ject#189) Signed-off-by: Didan Deng <[email protected]> Signed-off-by: Fanli Lin <[email protected]>

…ject#189) Signed-off-by: Didan Deng <[email protected]> Signed-off-by: wangyu31577 <[email protected]>

…ject#189) Signed-off-by: Didan Deng <[email protected]>

wtomin mentioned this pull request Dec 4, 2025

[Feature]: Support Ulysses Sequence Parallelism for Diffusion Models #192

Closed

1 task

hsliuustc0106 mentioned this pull request Dec 4, 2025

[RFC]: DiT model and feature support enhancement #85

Closed

58 tasks

wtomin force-pushed the usp branch 8 times, most recently from 90138d4 to d0916d9 Compare December 10, 2025 08:38

wtomin marked this pull request as ready for review December 10, 2025 11:13

wtomin requested a review from hsliuustc0106 as a code owner December 10, 2025 11:13

wtomin changed the title ~~[Diffusion][WIP]: Diffusion Model Parallelism support~~ [Diffusion]: Diffusion Model Parallelism support Dec 10, 2025

wtomin changed the title ~~[Diffusion]: Diffusion Model Parallelism support~~ [Diffusion]: Diffusion Ulysses-Sequence-Parallelism support Dec 10, 2025

chatgpt-codex-connector bot reviewed Dec 10, 2025

View reviewed changes

vllm_omni/diffusion/data.py Show resolved Hide resolved

vllm_omni/diffusion/distributed/parallel_state.py Show resolved Hide resolved

hsliuustc0106 requested review from SamitHuang and ZJY0516 December 11, 2025 01:45

wtomin mentioned this pull request Dec 11, 2025

[RFC]: Proposal for Supporting Context Parallelism (RingAttention) and Parallelism Terminology Alignment in vLLM-Omni #252

Closed

1 task

SamitHuang reviewed Dec 11, 2025

View reviewed changes

examples/offline_inference/qwen_image/text_to_image_sp.py Show resolved Hide resolved

congw729 reviewed Dec 11, 2025

View reviewed changes

tests/diffusion/attention/test_ulysses_sequence_parallel.py Show resolved Hide resolved

ZJY0516 reviewed Dec 11, 2025

View reviewed changes

mxuax mentioned this pull request Dec 11, 2025

[Diffusion]: Diffusion Ring Attention support #273

Merged

5 tasks

wtomin force-pushed the usp branch 2 times, most recently from dc569ac to ecc925c Compare December 12, 2025 09:52

wtomin force-pushed the usp branch from ac27506 to b77022d Compare December 12, 2025 11:08

wtomin added 17 commits December 17, 2025 13:57

e2e test

6239b06

Signed-off-by: Didan Deng <[email protected]>

fix image edit shape

a26dc01

Signed-off-by: Didan Deng <[email protected]>

fix ci

8c9e400

Signed-off-by: Didan Deng <[email protected]>

fix mkdocs

597af26

Signed-off-by: Didan Deng <[email protected]>

fix docs

74da8cb

Signed-off-by: Didan Deng <[email protected]>

fix docs

0624d59

Signed-off-by: Didan Deng <[email protected]>

fix image edit example

8f630d2

Signed-off-by: Didan Deng <[email protected]>

args name degree to size except for ring&ulysses degrees

0ecd312

Signed-off-by: Didan Deng <[email protected]>

rm attention npu

ed46752

Signed-off-by: Didan Deng <[email protected]>

fix ci

e06b049

Signed-off-by: Didan Deng <[email protected]>

rm simple test

eb6818c

Signed-off-by: Didan Deng <[email protected]>

rm pipeline test

94026b0

Signed-off-by: Didan Deng <[email protected]>

fix pre-commit

db5c134

Signed-off-by: Didan Deng <[email protected]>

fix ci

14f17e2

Signed-off-by: Didan Deng <[email protected]>

extend time out minutes

1c55873

Signed-off-by: Didan Deng <[email protected]>

test sp pipeline

adf5ae7

Signed-off-by: Didan Deng <[email protected]>

change docs structure

9d14371

Signed-off-by: Didan Deng <[email protected]>

wtomin force-pushed the usp branch from 65d9b1d to e99da9f Compare December 17, 2025 06:34

remove ring degree

824f452

Signed-off-by: Didan Deng <[email protected]>

wtomin force-pushed the usp branch 2 times, most recently from 6a42082 to 6f8acc9 Compare December 17, 2025 07:01

fix docs

b99f48c

Signed-off-by: Didan Deng <[email protected]>

wtomin force-pushed the usp branch from 6f8acc9 to b99f48c Compare December 17, 2025 07:02

hsliuustc0106 approved these changes Dec 17, 2025

View reviewed changes

hsliuustc0106 merged commit 85f0c11 into vllm-project:main Dec 17, 2025
4 checks passed

faaany pushed a commit to faaany/vllm-omni that referenced this pull request Dec 19, 2025

[Diffusion]: Diffusion Ulysses-Sequence-Parallelism support (vllm-pro…

1d8de55

…ject#189) Signed-off-by: Didan Deng <[email protected]> Signed-off-by: Fanli Lin <[email protected]>

yenuo26 pushed a commit to yenuo26/vllm-omni that referenced this pull request Dec 29, 2025

[Diffusion]: Diffusion Ulysses-Sequence-Parallelism support (vllm-pro…

7e3e464

…ject#189) Signed-off-by: Didan Deng <[email protected]> Signed-off-by: wangyu31577 <[email protected]>

hsliuustc0106 mentioned this pull request Dec 30, 2025

[Installation]: does omni support multi-gpu #546

Closed

1 task

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

[Diffusion]: Diffusion Ulysses-Sequence-Parallelism support (vllm-pro…

9c84fc8

…ject#189) Signed-off-by: Didan Deng <[email protected]>

wtomin mentioned this pull request Jan 16, 2026

[RFC]: Diffusion Models Features Supports Plan #814

Open

53 tasks

Conversation

wtomin commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Discussion

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SamitHuang commented Dec 11, 2025

Uh oh!

congw729 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZJY0516 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

wtomin Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

wtomin commented Dec 12, 2025

Uh oh!

hsliuustc0106 commented Dec 13, 2025

Uh oh!

gcanlin commented Dec 13, 2025

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wtomin commented Dec 4, 2025 •

edited

Loading

congw729 commented Dec 11, 2025 •

edited

Loading