[Debug] Update GLM-Image Pipeline by tzhouam · Pull Request #1049 · vllm-project/vllm-omni

tzhouam · 2026-01-29T03:55:29Z

Purpose

This PR tries to update the GLM image fixing the bug in #1017 .

Test Plan

Tested for the same cmd:

python "examples/offline_inference/image_to_image/image_edit.py" \
  --model "zai-org/GLM-Image" \
  --image "qwen-bear.png" \
  --prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
  --output "GLM-Image_edit.png" \
  --num_inference_steps 50 \
  --cfg_scale 4.0

Test Result

WARNING 01-29 03:49:28 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
WARNING 01-29 03:49:28 [yuanrong_connector.py:14] Datasystem not available, YuanrongConnector will not work
WARNING 01-29 03:49:32 [envs.py:76] Flash Attention library "flash_attn" not found, using pytorch attention implementation
INFO 01-29 03:49:32 [omni.py:119] Initializing stages for model: zai-org/GLM-Image
INFO 01-29 03:49:32 [initialization.py:35] No OmniTransferConfig provided
INFO 01-29 03:49:32 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'diffusion', 'runtime': {'process': True, 'devices': '0', 'max_batch_size': 1}, 'engine_args': {'vae_use_slicing': False, 'vae_use_tiling': False, 'cache_backend': None, 'cache_config': None, 'parallel_config': {'pipeline_parallel_size': 1, 'data_parallel_size': 1, 'tensor_parallel_size': 1, 'sequence_parallel_size': 1, 'ulysses_degree': 1, 'ring_degree': 1, 'cfg_parallel_size': 1}, 'enforce_eager': False, 'enable_cpu_offload': False, 'model': 'zai-org/GLM-Image', 'model_stage': 'diffusion'}, 'final_output': True, 'final_output_type': 'image'}
INFO 01-29 03:49:32 [omni.py:338] [Orchestrator] Waiting for 1 stages to initialize (timeout: 300s)
[Stage-0] WARNING 01-29 03:49:50 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
[Stage-0] WARNING 01-29 03:49:50 [yuanrong_connector.py:14] Datasystem not available, YuanrongConnector will not work
[Stage-0] WARNING 01-29 03:49:52 [envs.py:76] Flash Attention library "flash_attn" not found, using pytorch attention implementation
[Stage-0] INFO 01-29 03:49:52 [omni_stage.py:497] Starting stage worker with model: zai-org/GLM-Image
[Stage-0] INFO 01-29 03:49:52 [omni_stage.py:510] [Stage] Set VLLM_WORKER_MULTIPROC_METHOD=spawn
[Stage-0] INFO 01-29 03:49:54 [weight_utils.py:46] Using model weights format ['*']
[Stage-0] INFO 01-29 03:49:54 [weight_utils.py:46] Using model weights format ['*']
[Stage-0] INFO 01-29 03:49:54 [multiproc_executor.py:74] Starting server...
[Stage-0] WARNING 01-29 03:50:12 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
[Stage-0] WARNING 01-29 03:50:12 [yuanrong_connector.py:14] Datasystem not available, YuanrongConnector will not work
[Stage-0] WARNING 01-29 03:50:15 [envs.py:76] Flash Attention library "flash_attn" not found, using pytorch attention implementation
[Stage-0] INFO 01-29 03:50:21 [diffusion_worker.py:269] Worker 0 created result MessageQueue
[Stage-0] INFO 01-29 03:50:22 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=2048.
[Stage-0] INFO 01-29 03:50:22 [vllm.py:630] Asynchronous scheduling is enabled.
[Stage-0] INFO 01-29 03:50:22 [vllm.py:637] Disabling NCCL for DP synchronization when using async scheduling.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Stage-0] INFO 01-29 03:50:22 [diffusion_worker.py:95] Worker 0: Initialized device and distributed environment.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Stage-0] INFO 01-29 03:50:23 [weight_utils.py:46] Using model weights format ['*']
[Stage-0] INFO 01-29 03:50:23 [pipeline_glm_image.py:284] Loading GlmImageForConditionalGeneration (AR model)...
Loading weights: 100%|███████████████████████████████████████████████████████████████████| 1011/1011 [00:03<00:00, 310.63it/s, Materializing param=model.vqmodel.quantize.embedding.weight]
The image processor of type `GlmImageImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. 
[Stage-0] INFO 01-29 03:50:29 [pipeline_glm_image.py:297] Loading T5EncoderModel (glyph encoder)...
Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 526.65it/s, Materializing param=shared.weight]
[Stage-0] INFO 01-29 03:50:29 [pipeline_glm_image.py:310] Loading AutoencoderKL (VAE)...
[Stage-0] INFO 01-29 03:50:30 [pipeline_glm_image.py:317] Loading GlmImageTransformer2DModel (DiT)...
[Stage-0] INFO 01-29 03:50:30 [platform.py:65] Defaulting to diffusion attention backend FLASH_ATTN
Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:00<00:01,  1.28it/s]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:01<00:00,  1.20it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.30it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.28it/s]

[Stage-0] INFO 01-29 03:50:34 [diffusers_loader.py:227] Loading weights took 2.34 seconds
[Stage-0] INFO 01-29 03:50:35 [diffusion_model_runner.py:100] Model loading took 33.0291 GiB and 12.622001 seconds
[Stage-0] INFO 01-29 03:50:35 [diffusion_model_runner.py:105] Model runner: Model loaded successfully.
[Stage-0] WARNING 01-29 03:50:35 [compile.py:27] Regional compilation skipped because the model does not define `_repeated_blocks`.
[Stage-0] INFO 01-29 03:50:35 [diffusion_model_runner.py:127] Model runner: Model compiled with torch.compile.
[Stage-0] INFO 01-29 03:50:35 [diffusion_model_runner.py:137] Model runner: Initialization complete.
[Stage-0] INFO 01-29 03:50:35 [manager.py:90] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
[Stage-0] INFO 01-29 03:50:35 [diffusion_worker.py:126] Worker 0: Initialization complete.
[Stage-0] INFO 01-29 03:50:35 [diffusion_worker.py:393] Worker 0: Scheduler loop started.
[Stage-0] INFO 01-29 03:50:35 [diffusion_worker.py:320] Worker 0 ready to receive requests via shared memory
[Stage-0] INFO 01-29 03:50:35 [scheduler.py:39] SyncScheduler initialized result MessageQueue
[Stage-0] INFO 01-29 03:50:35 [diffusion_engine.py:332] dummy run to warm up the model
[Stage-0] INFO 01-29 03:50:35 [manager.py:538] Deactivating all adapters: 0 layers
[Stage-0] WARNING 01-29 03:50:35 [kv_transfer_manager.py:452] Request has no ID, cannot receive KV cache
[Stage-0] INFO 01-29 03:50:35 [pipeline_glm_image.py:910] Generating prior tokens with AR model...
[Stage-0] INFO 01-29 03:50:59 [pipeline_glm_image.py:919] Encoding prompt...
[Stage-0] INFO 01-29 03:50:59 [pipeline_glm_image.py:975] Starting denoising loop with 1 steps...
[Stage-0] INFO 01-29 03:51:00 [pipeline_glm_image.py:990] Decoding latents with VAE...
[Stage-0] INFO 01-29 03:51:01 [omni_stage.py:731] Max batch size: 1
INFO 01-29 03:51:01 [omni.py:331] [Orchestrator] Stage-0 reported ready
INFO 01-29 03:51:01 [omni.py:357] [Orchestrator] All stages initialized successfully
Pipeline loaded

============================================================
Generation Configuration:
  Model: zai-org/GLM-Image
  Inference steps: 50
  Cache backend: None (no acceleration)
  Input image size: (832, 1056)
  Parallel configuration: ulysses_degree=1, ring_degree=1, cfg_parallel_size=1, tensor_parallel_size=1
============================================================

Adding requests:   0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s[Stage-0] INFO 01-29 03:51:01 [diffusion_engine.py:70] Pre-processing completed in 0.0345 seconds                 | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 unit/s, output: 0.00 unit/s]
[Stage-0] INFO 01-29 03:51:01 [manager.py:538] Deactivating all adapters: 0 layers
[Stage-0] WARNING 01-29 03:51:01 [kv_transfer_manager.py:356] No connector available for receiving KV cache
[Stage-0] INFO 01-29 03:51:01 [pipeline_glm_image.py:910] Generating prior tokens with AR model...
[Stage-0] INFO 01-29 03:51:17 [pipeline_glm_image.py:919] Encoding prompt...
[Stage-0] INFO 01-29 03:51:17 [pipeline_glm_image.py:932] Preparing KV cache for Image Edit mode...
[Stage-0] INFO 01-29 03:51:17 [pipeline_glm_image.py:975] Starting denoising loop with 50 steps...
[Stage-0] INFO 01-29 03:51:24 [pipeline_glm_image.py:990] Decoding latents with VAE...
[Stage-0] INFO 01-29 03:51:24 [diffusion_engine.py:75] Generation completed successfully.
[Stage-0] INFO 01-29 03:51:24 [diffusion_engine.py:93] Post-processing completed in 0.0582 seconds
INFO 01-29 03:51:24 [log_utils.py:550] {'type': 'request_level_metrics',
INFO 01-29 03:51:24 [log_utils.py:550]  'request_id': '0_e9e033ee-cf9a-4919-9f53-a6db5c67ceea',
INFO 01-29 03:51:24 [log_utils.py:550]  'e2e_time_ms': 23183.69770050049,
INFO 01-29 03:51:24 [log_utils.py:550]  'e2e_tpt': 0.0,
INFO 01-29 03:51:24 [log_utils.py:550]  'e2e_total_tokens': 0,
INFO 01-29 03:51:24 [log_utils.py:550]  'transfers_total_time_ms': 0.0,
INFO 01-29 03:51:24 [log_utils.py:550]  'transfers_total_bytes': 0,
INFO 01-29 03:51:24 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 23065.14549255371,
INFO 01-29 03:51:24 [log_utils.py:550]                 'num_tokens_out': 0,
INFO 01-29 03:51:24 [log_utils.py:550]                 'num_tokens_in': 0}}}
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00, 23.18s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-29 03:51:24 [omni.py:860] [Summary] {'e2e_requests': 1,█████████████████████████████████████████| 1/1 [00:23<00:00, 23.18s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-29 03:51:24 [omni.py:860]  'e2e_total_time_ms': 23184.942483901978,
INFO 01-29 03:51:24 [omni.py:860]  'e2e_sum_time_ms': 23183.69770050049,
INFO 01-29 03:51:24 [omni.py:860]  'e2e_total_tokens': 0,
INFO 01-29 03:51:24 [omni.py:860]  'e2e_avg_time_per_request_ms': 23183.69770050049,
INFO 01-29 03:51:24 [omni.py:860]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-29 03:51:24 [omni.py:860]  'wall_time_ms': 23184.942483901978,
INFO 01-29 03:51:24 [omni.py:860]  'final_stage_id': {'0_e9e033ee-cf9a-4919-9f53-a6db5c67ceea': 0},
INFO 01-29 03:51:24 [omni.py:860]  'stages': [{'stage_id': 0,
INFO 01-29 03:51:24 [omni.py:860]              'requests': 1,
INFO 01-29 03:51:24 [omni.py:860]              'tokens': 0,
INFO 01-29 03:51:24 [omni.py:860]              'total_time_ms': 23184.138536453247,
INFO 01-29 03:51:24 [omni.py:860]              'avg_time_per_request_ms': 23184.138536453247,
INFO 01-29 03:51:24 [omni.py:860]              'avg_tokens_per_s': 0.0}],
INFO 01-29 03:51:24 [omni.py:860]  'transfers': []}
Total generation time: 23.1862 seconds (23186.17 ms)
INFO 01-29 03:51:24 [image_edit.py:431] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_e9e033ee-cf9a-4919-9f53-a6db5c67ceea', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt={'prompt': "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'", 'negative_prompt': None, 'multi_modal_data': {'image': <PIL.Image.Image image mode=RGB size=832x1056 at 0x7FF8E13332C0>}, 'additional_information': {'global_request_id': ['0_e9e033ee-cf9a-4919-9f53-a6db5c67ceea'], 'preprocessed_image': tensor([[[[ 1.0000,  0.9843,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9922,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9765,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           ...,
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9529, -1.0000, -0.9765],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9922, -1.0000, -0.9922],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -1.0000, -0.9765, -1.0000]],
INFO 01-29 03:51:24 [image_edit.py:431] 
INFO 01-29 03:51:24 [image_edit.py:431]          [[ 1.0000,  0.9843,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9922,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9765,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           ...,
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9529, -1.0000, -0.9765],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9922, -1.0000, -0.9922],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -1.0000, -0.9765, -1.0000]],
INFO 01-29 03:51:24 [image_edit.py:431] 
INFO 01-29 03:51:24 [image_edit.py:431]          [[ 1.0000,  0.9843,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9922,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9765,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           ...,
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9529, -1.0000, -0.9765],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9922, -1.0000, -0.9922],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -1.0000, -0.9765, -1.0000]]]]), 'prompt_image': [<PIL.Image.Image image mode=RGB size=832x1056 at 0x7FF8E1333350>]}}, latents=None, metrics={}, multimodal_output={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]
Saved edited image to /home/dyvm6xra/dyvm6xrauser49/project/GLM-Image_edit.png
[Stage-0] INFO 01-29 03:51:25 [omni_stage.py:779] Received shutdown signal
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 814ce4a4e6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-01-29T03:58:45Z

vllm_omni/diffusion/models/glm_image/pipeline_glm_image.py

            condition_grid = image_grid_thw[:-1]
            prior_token_image_embed = self.vision_language_encoder.get_image_features(
                inputs["pixel_values"], condition_grid
-            )
+            ).pooler_output
            prior_token_image_embed = torch.cat(prior_token_image_embed, dim=0)


Remove torch.cat on pooled image features tensor

With the new .pooler_output access, prior_token_image_embed is a single tensor (the pooled image features). torch.cat(prior_token_image_embed, dim=0) now raises TypeError: cat() received an invalid combination of arguments because torch.cat requires a sequence of tensors, not a tensor. This will crash image-edit requests that include condition images (the only path where this block runs). Consider using the tensor directly (or wrapping it in a list only if you truly need to concatenate multiple tensors).

Useful? React with 👍 / 👎.

david6666666 · 2026-01-29T06:50:28Z

LGTM

Co-authored-by: root <root@hk01dgx028.cm.cluster>

root added 2 commits January 29, 2026 03:17

debug: glm image

f446d40

Refactor: Simplify condition image processing in GlmImagePipeline

814ce4a

tzhouam requested a review from hsliuustc0106 as a code owner January 29, 2026 03:55

chatgpt-codex-connector bot reviewed Jan 29, 2026

View reviewed changes

tzhouam added the ready label to trigger buildkite CI label Jan 29, 2026

david6666666 linked an issue Jan 29, 2026 that may be closed by this pull request

[Bug]: zai-org/GLM-Image image_to_image error, ValueError: not enough values to unpack (expected 4, got 3) #1017

Closed

1 task

david6666666 approved these changes Jan 29, 2026

View reviewed changes

david6666666 merged commit def3956 into vllm-project:main Jan 29, 2026
7 checks passed

dongbo910220 pushed a commit to dongbo910220/vllm-omni that referenced this pull request Feb 1, 2026

[Debug] Update GLM-Image Pipeline (vllm-project#1049)

830bf89

Co-authored-by: root <root@hk01dgx028.cm.cluster>

tzhouam deleted the dev/debug_glm_image branch February 23, 2026 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Debug] Update GLM-Image Pipeline#1049

[Debug] Update GLM-Image Pipeline#1049
david6666666 merged 2 commits intovllm-project:mainfrom
tzhouam:dev/debug_glm_image

tzhouam commented Jan 29, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 29, 2026

Uh oh!

david6666666 commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tzhouam commented Jan 29, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants