Skip to content

[New model] Support model qwen image layered#381

Merged
hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Bounty-hunter:support_qwen_image_layered
Dec 20, 2025
Merged

[New model] Support model qwen image layered#381
hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Bounty-hunter:support_qwen_image_layered

Conversation

@Bounty-hunter
Copy link
Contributor

@Bounty-hunter Bounty-hunter commented Dec 19, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Support Qwen image Layered model

Run with image_edit.py,parameters of strong correlation are --color-format "RGBA" and --layers x

Test Plan

(1) run with vllm_omni

python image_edit.py --model "Qwen/Qwen-Image-Layered"   --image xxx --output "image_1_layered_" --num_inference_step 50 --cfg_scale 4.0 --layers x --prompt "" --color-format "RGBA"  --seed 777

(2) run with diffusers

from diffusers import QwenImageLayeredPipeline
import torch
from PIL import Image

pipeline = QwenImageLayeredPipeline.from_pretrained("Qwen/Qwen-Image-Layered")
pipeline = pipeline.to("cuda", torch.bfloat16)
pipeline.set_progress_bar_config(disable=None)

image = Image.open("/home/d00806799/code/vllm-omni/examples/offline_inference/image_to_image/vllm.jpg").convert("RGBA")
inputs = {
    "image": image,
    "generator": torch.Generator(device='cuda').manual_seed(777),
    "true_cfg_scale": 4.0,
    "negative_prompt": " ",
    "num_inference_steps": 50,
    "num_images_per_prompt": 1,
    "layers": 3,
    "resolution": 640,      # Using different bucket (640, 1024) to determine the resolution. For this version, 640 is recommended
    "cfg_normalize": False,  # Whether enable cfg normalization.
    "use_en_prompt": False,  # Automatic caption language if user does not provide caption
}

with torch.inference_mode():
    output = pipeline(**inputs)
    output_image = output.images[0]

for i, image in enumerate(output_image):
    image.save(f"{i}.png")

(3) Run image edit also success

INFO 12-20 14:16:47 [omni_diffusion.py:86] Prepared 1 requests for generation.
INFO 12-20 14:16:47 [diffusion_engine.py:43] Pre-processing completed in 0.0680 seconds
INFO 12-20 14:17:47 [shm_broadcast.py:501] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation, weight/kv cache quantization).
INFO 12-20 14:18:43 [diffusion_engine.py:48] Generation completed successfully.
INFO 12-20 14:18:43 [diffusion_engine.py:53] Post-processing completed in 0.0667 seconds
Total generation time: 116.1441 seconds (116144.12 ms)
Saved edited image to /home/d00806799/code/vllm-omni/examples/offline_inference/image_to_image/output_image_edit_0.png
INFO 12-20 14:18:43 [gpu_worker.py:198] Worker 0: Received shutdown message
INFO 12-20 14:18:43 [gpu_worker.py:222] event loop terminated.
INFO 12-20 14:18:43 [gpu_worker.py:253] Worker 0: Shutdown complete.

Test Result

(1) vllm-omni 2-layers
image

(2) diffusers 2-layers
image

(3) vllm-omni 3-layers
image

(4) diffusers 3-layers
image


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 73 to 77
assert requests.resolution in [640, 1024], (
f"resolution must be either 640 or 1024, but got {requests.resolution}"
)
calculated_width, calculated_height = calculate_dimensions(
requests.resolution * requests.resolution, image_size[0] / image_size[1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use request resolution instead of list attribute

Inside the pre-processing loop the resolution is read from requests rather than the individual req, so the function raises an AttributeError before any request is processed because the list object has no resolution attribute. This prevents the new layered pipeline from calculating dimensions or running at all.

Useful? React with 👍 / 👎.

Comment on lines 733 to +736
temb = (
self.time_text_embed(timestep, hidden_states)
self.time_text_embed(timestep, hidden_states, additional_t_cond)
if guidance is None
else self.time_text_embed(timestep, guidance, hidden_states)
else self.time_text_embed(timestep, guidance, hidden_states, additional_t_cond)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guidance branch calls time embedding with wrong signature

The guidance code path calls self.time_text_embed(timestep, guidance, hidden_states, additional_t_cond), but QwenTimestepProjEmbeddings.forward only accepts (timestep, hidden_states, addition_t_cond=None). When guidance is enabled (e.g., for guidance-distilled models), this path raises a TypeError before any diffusion steps run.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Collaborator

add test ci please

Copy link
Collaborator

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also test if this break qwen-image and qwen-image-edit

@Bounty-hunter Bounty-hunter force-pushed the support_qwen_image_layered branch from 59173fe to 607147c Compare December 20, 2025 13:02
@@ -0,0 +1,1054 @@
# Copyright 2025 The Qwen-Image Team, Wan Team and The HuggingFace Team. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can import this from diffusers directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modify for image layered in autoencoder_kl_qwenimage.py only in main branch; and current diffusers release version is 0.36

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User should install latest diffusers from source as mentioned in https://huggingface.co/Qwen/Qwen-Image-Layered#quick-start

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this file please

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we leave it for a later PR fixing this?

@Bounty-hunter Bounty-hunter force-pushed the support_qwen_image_layered branch from 607147c to a2d8c99 Compare December 20, 2025 13:50
@hsliuustc0106
Copy link
Collaborator

add the test result and provide the run example command

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Dec 20, 2025
@Bounty-hunter Bounty-hunter changed the title [WIP] Support model qwen image layered [FEATURE] Support model qwen image layered Dec 20, 2025
@Bounty-hunter
Copy link
Contributor Author

add the test result and provide the run example command
done

@hsliuustc0106 hsliuustc0106 changed the title [FEATURE] Support model qwen image layered [New model] Support model qwen image layered Dec 20, 2025
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 enabled auto-merge (squash) December 20, 2025 14:50
@hsliuustc0106 hsliuustc0106 merged commit 85bc8bf into vllm-project:main Dec 20, 2025
6 checks passed
wtomin pushed a commit to wtomin/vllm-omni that referenced this pull request Dec 22, 2025
yenuo26 pushed a commit to yenuo26/vllm-omni that referenced this pull request Dec 29, 2025
@Bounty-hunter Bounty-hunter mentioned this pull request Dec 30, 2025
5 tasks
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
@david6666666 david6666666 mentioned this pull request Jan 16, 2026
57 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants