Skip to content

[Diffusion]: Diffusion Ulysses-Sequence-Parallelism support#189

Merged
hsliuustc0106 merged 85 commits intovllm-project:mainfrom
wtomin:usp
Dec 17, 2025
Merged

[Diffusion]: Diffusion Ulysses-Sequence-Parallelism support#189
hsliuustc0106 merged 85 commits intovllm-project:mainfrom
wtomin:usp

Conversation

@wtomin
Copy link
Contributor

@wtomin wtomin commented Dec 4, 2025

This PR allows user to set Ulysses Attention for diffusion model, e.g., qwen-image and qwen-image-edit. Currently only tested it with SPDA attention on H800 GPUs.

Purpose

To support various parallelism inference algorithms, this PR introduce:

  • DiffusionParallelConfig in vllm_omni/diffusion/data.py: Configuration for diffusion model distributed execution.
  • vllm_omni/diffusion/distributed: handle the communication groups of different parallel configuration.
  • tests/diffusion/attention/test_ulysses_sequence_parallel.py‎: UT for ulysses attention and multi-layer ulysses attention.

This PR also edits:

  • vllm_omni/diffusion/attention/layer.py: allow Attention to accept ulysses attention kwargs and support ulysses attention in forward function;
  • ‎vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py: chunked hidden_states and image position embedding;
  • ‎vllm_omni/diffusion/worker/gpu_worker.py‎: replace vllm's init_distributed_environment and initialize_model_parallel by vllm_omni's equivalents.

Test Plan

UTs:

  • fast ut: pytest tests/diffusion/attention/test_ulysses_sequence_parallel.py
  • fast ut: pytest tests/diffusion/distributed/test_comm.py
  • slow ut: test a small random diffusion model tests/e2e/offline_inference/test_sequence_parallel.py

T2I inference:
python examples/offline_inference/text_to_image/text_to_image.py --ulysses_degree 2‎

Test Result

  • fast UT
    all passed.

  • T2I inference

    • attention backends: spda
    • GPU: H800
    • image resolution (1024x1024)
num_gpus ulysses degree sec/img generated image script
1 - 19.14 qwen_image_output python examples/offline_inference/text_to_image/text_to_image.py
2 2 16.08 qwen_image_output_sp python examples/offline_inference/text_to_image/text_to_image.py --ulysses_degree 2‎

I tried to test ulysses attention with diffusers ContextParallelConfig(ulysses_degree=2) on qwen-image, but got an error. Refer to #huggingface/diffusers#12568. Difffusers commuity is working on solving it.

To measure the parallelism methods, we run benchmarks with Qwen/Qwen-Image model generating images (2048x2048 as long sequence input) with 50 inference steps. The hardware devices are NVIDIA H800 GPUs. sdpa is the attention backends.

Configuration Ulysses degree Generation Time Speedup
Baseline (diffusers) - 112.5s 1.0x
Ulysses-SP 2 65.2s 1.73x
Ulysses-SP 4 39.6s 2.84x
Ulysses-SP 8 30.8s 3.65x

Discussion

  1. Is it necessary to move vllm_omni/diffusion/distributed to vllm_omni/distributed in this PR?

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@wtomin wtomin force-pushed the usp branch 8 times, most recently from 90138d4 to d0916d9 Compare December 10, 2025 08:38
@wtomin wtomin marked this pull request as ready for review December 10, 2025 11:13
@wtomin wtomin changed the title [Diffusion][WIP]: Diffusion Model Parallelism support [Diffusion]: Diffusion Model Parallelism support Dec 10, 2025
@wtomin wtomin changed the title [Diffusion]: Diffusion Model Parallelism support [Diffusion]: Diffusion Ulysses-Sequence-Parallelism support Dec 10, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@SamitHuang
Copy link
Collaborator

nice work. is the parallel speedup ratio normal compared to diffusers' ulysses sp?

@congw729
Copy link
Contributor

congw729 commented Dec 11, 2025

The DOC check has failed, pls resolve all the warnings locally.

Local Documentation Build

pip install -e ".[docs]"
mkdocs build
mkdocs serve

First, make sure there is no warning in the logging messages.
Then open http://127.0.0.1:8000/ in a browser to verify that all navigation links and content display properly

vllm_config.parallel_config.data_parallel_size = self.od_config.parallel_config.data_parallel_size

with set_current_omni_diffusion_config(self.od_config):
with set_current_vllm_config(vllm_config):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we still need vllm_config since we have our own init function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because VllmConfig is set in set_current_vllm_config, and set_current_omni_diffusion_config only set OmniDiffusionConfig. Any suggestions?

@wtomin wtomin force-pushed the usp branch 2 times, most recently from dc569ac to ecc925c Compare December 12, 2025 09:52
@wtomin
Copy link
Contributor Author

wtomin commented Dec 12, 2025

nice work. is the parallel speedup ratio normal compared to diffusers' ulysses sp?

Unfortunately, diffusers' ulysses sp on qwen-image has an error. #huggingface/diffusers#12568. Still working in progress.

@hsliuustc0106
Copy link
Collaborator

@gcanlin PTAL

@gcanlin
Copy link
Contributor

gcanlin commented Dec 13, 2025

Nice work! Would be better if NPUWoker can be applied with the same changes. But it’s fine to modify only the GPUWorker. I’m thinking of refactoring the common logic between NPU and GPU into a base worker abstraction in the following PR, which should help reduce duplication in future updates :)
I’ll test the performance gains on NPU first using this PR.

Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
@wtomin wtomin force-pushed the usp branch 2 times, most recently from 6a42082 to 6f8acc9 Compare December 17, 2025 07:01
Signed-off-by: Didan Deng <[email protected]>
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit 85f0c11 into vllm-project:main Dec 17, 2025
4 checks passed
faaany pushed a commit to faaany/vllm-omni that referenced this pull request Dec 19, 2025
yenuo26 pushed a commit to yenuo26/vllm-omni that referenced this pull request Dec 29, 2025
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants