Skip to content

[Diffusion][Feature] CFG parallel support for Qwen-Image#444

Merged
hsliuustc0106 merged 15 commits intovllm-project:mainfrom
wtomin:cfg-fix
Jan 6, 2026
Merged

[Diffusion][Feature] CFG parallel support for Qwen-Image#444
hsliuustc0106 merged 15 commits intovllm-project:mainfrom
wtomin:cfg-fix

Conversation

@wtomin
Copy link
Contributor

@wtomin wtomin commented Dec 24, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims to support CFG-Parallel for Qwen-Image series of models. CFG-parallel runs the positive/negative prompts of classifier-free guidance (CFG) on different devices, then merges on a single device to perform the scheduler step. It includes:

Test Plan

  • image generation

python examples/offline_inference/text_to_image/text_to_image.py --cfg_parallel_size 2

python examples/offline_inference/text_to_image/text_to_image.py --cfg_parallel_size 2 --cache_backend tea_cache

  • image edit

python examples/offline_inference/image_to_image/image_edit.py --model "Qwen/Qwen-Image-Edit" --image ./qwen_image_output.png --prompt "turn this coffee cup to a glass of wine'" --output output_image_edit.png --num_inference_steps 50 --cfg_scale 4.0 --cfg_parallel_size 2

python examples/offline_inference/image_to_image/image_edit.py --model "Qwen/Qwen-Image-Edit" --image ./qwen_image_output.png --prompt "turn this coffee cup to a glass of wine'" --output output_image_edit.png --num_inference_steps 50 --cfg_scale 4.0 --cache_backend tea_cache --cfg_parallel_size 2

python examples/offline_inference/image_to_image/image_edit.py --model "Qwen/Qwen-Image-Edit-2509" --image ./qwen_image_output.png --prompt "turn this coffee cup to a glass of wine'" --output output_image_edit.png --num_inference_steps 50 --cfg_scale 4.0 --cfg_parallel_size 2

python examples/offline_inference/image_to_image/image_edit.py --model "Qwen/Qwen-Image-Layered" --image ./qwen_image_output.png --prompt "turn this coffee cup to a glass of wine'" --output output_image_edit.png --num_inference_steps 50 --cfg_scale 4.0 --layers 2 --color-format "RGBA" --output "layered" --cfg_parallel_size 2

Test Result

task model cfg_parallel_size time generated image
T2I Qwen/Qwen-Image 1 20.5s qwen_image_output
T2I Qwen/Qwen-Image 2 13.08s qwen_image_output
I2I Qwen/Qwen-Image-Edit 1 54.3s output_image_edit_0
I2I Qwen/Qwen-Image-Edit 2 29.9s output_image_edit_0
I2I Qwen/Qwen-Image-Edit-2509 1 45.0s output_image_edit_0
I2I Qwen/Qwen-Image-Edit-2509 2 25.5s output_image_edit_0
I2I Qwen/Qwen-Image-Layered 1 32.3s layered_0 layered_1
I2I Qwen/Qwen-Image-Layered 2 19.3s layered_0 layered_1
task cache backend model cfg_parallel_size time generated image
T2I tea_cache Qwen/Qwen-Image 1 12.5s qwen_image_output_teacache
T2I tea_cache Qwen/Qwen-Image 2 10.6s qwen_image_output_teacache_cfg
I2I tea_cache Qwen/Qwen-Image-Edit 1 24.35s output_image_edit
I2I tea_cache Qwen/Qwen-Image-Edit 2 17.06s output_image_edit

Setting:

  • vllm: 0.12.0
  • pytorch: 2.9.0
  • python: 3.12
  • cuda: 12.8

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Dec 24, 2025

looking forward to it

@hsliuustc0106
Copy link
Collaborator

do we expect it to be merged before 1230 release? @wtomin

@wtomin
Copy link
Contributor Author

wtomin commented Dec 25, 2025

do we expect it to be merged before 1230 release? @wtomin

I think so. I will get it done today.

@wtomin wtomin marked this pull request as ready for review December 26, 2025 08:25
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hsliuustc0106
Copy link
Collaborator

any progress or comment @gcanlin @ZJY0516 @wtomin @SamitHuang

Copy link
Contributor

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would CGF parallel perform better than USP? If roughly the same, I’d prefer to move this to the next release to ensure the quality of 0.12.0.

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Dec 30, 2025

Would CGF parallel perform better than USP? If roughly the same, I’d prefer to move this to the next release to ensure the quality of 0.12.0.

They are orthogonal

@wtomin wtomin force-pushed the cfg-fix branch 2 times, most recently from 9107039 to b96d386 Compare January 5, 2026 10:32
@wtomin
Copy link
Contributor Author

wtomin commented Jan 5, 2026

@hsliuustc0106 The current branch is compatible with tea_cache.

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 5, 2026
@hsliuustc0106
Copy link
Collaborator

fix precommit please

@hsliuustc0106
Copy link
Collaborator

@ZJY0516 PTAL final check

@gcanlin
Copy link
Contributor

gcanlin commented Jan 6, 2026

Also work on NPU. Thanks!

python text_to_image.py --cfg_parallel_size 2

And it achieved the expected speedup.

model config time
Qwen-Image --cfg_parallel_size 2 58s
Qwen-Image --cfg_parallel_size 1 37s

wtomin added 12 commits January 6, 2026 15:25
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
wtomin added 3 commits January 6, 2026 15:27
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Signed-off-by: Didan Deng <[email protected]>
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit 14c04d0 into vllm-project:main Jan 6, 2026
7 checks passed
@david6666666
Copy link
Collaborator

@wtomin lack of parameter passing in online serving scenarios, please add

Shirley125 pushed a commit to Shirley125/vllm-omni that referenced this pull request Jan 9, 2026
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
sniper35 pushed a commit to sniper35/vllm-omni that referenced this pull request Jan 10, 2026
ZJY0516 pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Jan 10, 2026
@wtomin wtomin deleted the cfg-fix branch February 2, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants