Skip to content

[Debug] Update GLM-Image Pipeline#1049

Merged
david6666666 merged 2 commits intovllm-project:mainfrom
tzhouam:dev/debug_glm_image
Jan 29, 2026
Merged

[Debug] Update GLM-Image Pipeline#1049
david6666666 merged 2 commits intovllm-project:mainfrom
tzhouam:dev/debug_glm_image

Conversation

@tzhouam
Copy link
Collaborator

@tzhouam tzhouam commented Jan 29, 2026

Purpose

This PR tries to update the GLM image fixing the bug in #1017 .

Test Plan

Tested for the same cmd:

python "examples/offline_inference/image_to_image/image_edit.py" \
  --model "zai-org/GLM-Image" \
  --image "qwen-bear.png" \
  --prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
  --output "GLM-Image_edit.png" \
  --num_inference_steps 50 \
  --cfg_scale 4.0

Test Result

image
WARNING 01-29 03:49:28 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
WARNING 01-29 03:49:28 [yuanrong_connector.py:14] Datasystem not available, YuanrongConnector will not work
WARNING 01-29 03:49:32 [envs.py:76] Flash Attention library "flash_attn" not found, using pytorch attention implementation
INFO 01-29 03:49:32 [omni.py:119] Initializing stages for model: zai-org/GLM-Image
INFO 01-29 03:49:32 [initialization.py:35] No OmniTransferConfig provided
INFO 01-29 03:49:32 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'diffusion', 'runtime': {'process': True, 'devices': '0', 'max_batch_size': 1}, 'engine_args': {'vae_use_slicing': False, 'vae_use_tiling': False, 'cache_backend': None, 'cache_config': None, 'parallel_config': {'pipeline_parallel_size': 1, 'data_parallel_size': 1, 'tensor_parallel_size': 1, 'sequence_parallel_size': 1, 'ulysses_degree': 1, 'ring_degree': 1, 'cfg_parallel_size': 1}, 'enforce_eager': False, 'enable_cpu_offload': False, 'model': 'zai-org/GLM-Image', 'model_stage': 'diffusion'}, 'final_output': True, 'final_output_type': 'image'}
INFO 01-29 03:49:32 [omni.py:338] [Orchestrator] Waiting for 1 stages to initialize (timeout: 300s)
[Stage-0] WARNING 01-29 03:49:50 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
[Stage-0] WARNING 01-29 03:49:50 [yuanrong_connector.py:14] Datasystem not available, YuanrongConnector will not work
[Stage-0] WARNING 01-29 03:49:52 [envs.py:76] Flash Attention library "flash_attn" not found, using pytorch attention implementation
[Stage-0] INFO 01-29 03:49:52 [omni_stage.py:497] Starting stage worker with model: zai-org/GLM-Image
[Stage-0] INFO 01-29 03:49:52 [omni_stage.py:510] [Stage] Set VLLM_WORKER_MULTIPROC_METHOD=spawn
[Stage-0] INFO 01-29 03:49:54 [weight_utils.py:46] Using model weights format ['*']
[Stage-0] INFO 01-29 03:49:54 [weight_utils.py:46] Using model weights format ['*']
[Stage-0] INFO 01-29 03:49:54 [multiproc_executor.py:74] Starting server...
[Stage-0] WARNING 01-29 03:50:12 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
[Stage-0] WARNING 01-29 03:50:12 [yuanrong_connector.py:14] Datasystem not available, YuanrongConnector will not work
[Stage-0] WARNING 01-29 03:50:15 [envs.py:76] Flash Attention library "flash_attn" not found, using pytorch attention implementation
[Stage-0] INFO 01-29 03:50:21 [diffusion_worker.py:269] Worker 0 created result MessageQueue
[Stage-0] INFO 01-29 03:50:22 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=2048.
[Stage-0] INFO 01-29 03:50:22 [vllm.py:630] Asynchronous scheduling is enabled.
[Stage-0] INFO 01-29 03:50:22 [vllm.py:637] Disabling NCCL for DP synchronization when using async scheduling.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Stage-0] INFO 01-29 03:50:22 [diffusion_worker.py:95] Worker 0: Initialized device and distributed environment.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Stage-0] INFO 01-29 03:50:23 [weight_utils.py:46] Using model weights format ['*']
[Stage-0] INFO 01-29 03:50:23 [pipeline_glm_image.py:284] Loading GlmImageForConditionalGeneration (AR model)...
Loading weights: 100%|███████████████████████████████████████████████████████████████████| 1011/1011 [00:03<00:00, 310.63it/s, Materializing param=model.vqmodel.quantize.embedding.weight]
The image processor of type `GlmImageImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. 
[Stage-0] INFO 01-29 03:50:29 [pipeline_glm_image.py:297] Loading T5EncoderModel (glyph encoder)...
Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 526.65it/s, Materializing param=shared.weight]
[Stage-0] INFO 01-29 03:50:29 [pipeline_glm_image.py:310] Loading AutoencoderKL (VAE)...
[Stage-0] INFO 01-29 03:50:30 [pipeline_glm_image.py:317] Loading GlmImageTransformer2DModel (DiT)...
[Stage-0] INFO 01-29 03:50:30 [platform.py:65] Defaulting to diffusion attention backend FLASH_ATTN
Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:00<00:01,  1.28it/s]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:01<00:00,  1.20it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.30it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.28it/s]

[Stage-0] INFO 01-29 03:50:34 [diffusers_loader.py:227] Loading weights took 2.34 seconds
[Stage-0] INFO 01-29 03:50:35 [diffusion_model_runner.py:100] Model loading took 33.0291 GiB and 12.622001 seconds
[Stage-0] INFO 01-29 03:50:35 [diffusion_model_runner.py:105] Model runner: Model loaded successfully.
[Stage-0] WARNING 01-29 03:50:35 [compile.py:27] Regional compilation skipped because the model does not define `_repeated_blocks`.
[Stage-0] INFO 01-29 03:50:35 [diffusion_model_runner.py:127] Model runner: Model compiled with torch.compile.
[Stage-0] INFO 01-29 03:50:35 [diffusion_model_runner.py:137] Model runner: Initialization complete.
[Stage-0] INFO 01-29 03:50:35 [manager.py:90] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
[Stage-0] INFO 01-29 03:50:35 [diffusion_worker.py:126] Worker 0: Initialization complete.
[Stage-0] INFO 01-29 03:50:35 [diffusion_worker.py:393] Worker 0: Scheduler loop started.
[Stage-0] INFO 01-29 03:50:35 [diffusion_worker.py:320] Worker 0 ready to receive requests via shared memory
[Stage-0] INFO 01-29 03:50:35 [scheduler.py:39] SyncScheduler initialized result MessageQueue
[Stage-0] INFO 01-29 03:50:35 [diffusion_engine.py:332] dummy run to warm up the model
[Stage-0] INFO 01-29 03:50:35 [manager.py:538] Deactivating all adapters: 0 layers
[Stage-0] WARNING 01-29 03:50:35 [kv_transfer_manager.py:452] Request has no ID, cannot receive KV cache
[Stage-0] INFO 01-29 03:50:35 [pipeline_glm_image.py:910] Generating prior tokens with AR model...
[Stage-0] INFO 01-29 03:50:59 [pipeline_glm_image.py:919] Encoding prompt...
[Stage-0] INFO 01-29 03:50:59 [pipeline_glm_image.py:975] Starting denoising loop with 1 steps...
[Stage-0] INFO 01-29 03:51:00 [pipeline_glm_image.py:990] Decoding latents with VAE...
[Stage-0] INFO 01-29 03:51:01 [omni_stage.py:731] Max batch size: 1
INFO 01-29 03:51:01 [omni.py:331] [Orchestrator] Stage-0 reported ready
INFO 01-29 03:51:01 [omni.py:357] [Orchestrator] All stages initialized successfully
Pipeline loaded

============================================================
Generation Configuration:
  Model: zai-org/GLM-Image
  Inference steps: 50
  Cache backend: None (no acceleration)
  Input image size: (832, 1056)
  Parallel configuration: ulysses_degree=1, ring_degree=1, cfg_parallel_size=1, tensor_parallel_size=1
============================================================

Adding requests:   0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s[Stage-0] INFO 01-29 03:51:01 [diffusion_engine.py:70] Pre-processing completed in 0.0345 seconds                 | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 unit/s, output: 0.00 unit/s]
[Stage-0] INFO 01-29 03:51:01 [manager.py:538] Deactivating all adapters: 0 layers
[Stage-0] WARNING 01-29 03:51:01 [kv_transfer_manager.py:356] No connector available for receiving KV cache
[Stage-0] INFO 01-29 03:51:01 [pipeline_glm_image.py:910] Generating prior tokens with AR model...
[Stage-0] INFO 01-29 03:51:17 [pipeline_glm_image.py:919] Encoding prompt...
[Stage-0] INFO 01-29 03:51:17 [pipeline_glm_image.py:932] Preparing KV cache for Image Edit mode...
[Stage-0] INFO 01-29 03:51:17 [pipeline_glm_image.py:975] Starting denoising loop with 50 steps...
[Stage-0] INFO 01-29 03:51:24 [pipeline_glm_image.py:990] Decoding latents with VAE...
[Stage-0] INFO 01-29 03:51:24 [diffusion_engine.py:75] Generation completed successfully.
[Stage-0] INFO 01-29 03:51:24 [diffusion_engine.py:93] Post-processing completed in 0.0582 seconds
INFO 01-29 03:51:24 [log_utils.py:550] {'type': 'request_level_metrics',
INFO 01-29 03:51:24 [log_utils.py:550]  'request_id': '0_e9e033ee-cf9a-4919-9f53-a6db5c67ceea',
INFO 01-29 03:51:24 [log_utils.py:550]  'e2e_time_ms': 23183.69770050049,
INFO 01-29 03:51:24 [log_utils.py:550]  'e2e_tpt': 0.0,
INFO 01-29 03:51:24 [log_utils.py:550]  'e2e_total_tokens': 0,
INFO 01-29 03:51:24 [log_utils.py:550]  'transfers_total_time_ms': 0.0,
INFO 01-29 03:51:24 [log_utils.py:550]  'transfers_total_bytes': 0,
INFO 01-29 03:51:24 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 23065.14549255371,
INFO 01-29 03:51:24 [log_utils.py:550]                 'num_tokens_out': 0,
INFO 01-29 03:51:24 [log_utils.py:550]                 'num_tokens_in': 0}}}
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00, 23.18s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-29 03:51:24 [omni.py:860] [Summary] {'e2e_requests': 1,█████████████████████████████████████████| 1/1 [00:23<00:00, 23.18s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-29 03:51:24 [omni.py:860]  'e2e_total_time_ms': 23184.942483901978,
INFO 01-29 03:51:24 [omni.py:860]  'e2e_sum_time_ms': 23183.69770050049,
INFO 01-29 03:51:24 [omni.py:860]  'e2e_total_tokens': 0,
INFO 01-29 03:51:24 [omni.py:860]  'e2e_avg_time_per_request_ms': 23183.69770050049,
INFO 01-29 03:51:24 [omni.py:860]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-29 03:51:24 [omni.py:860]  'wall_time_ms': 23184.942483901978,
INFO 01-29 03:51:24 [omni.py:860]  'final_stage_id': {'0_e9e033ee-cf9a-4919-9f53-a6db5c67ceea': 0},
INFO 01-29 03:51:24 [omni.py:860]  'stages': [{'stage_id': 0,
INFO 01-29 03:51:24 [omni.py:860]              'requests': 1,
INFO 01-29 03:51:24 [omni.py:860]              'tokens': 0,
INFO 01-29 03:51:24 [omni.py:860]              'total_time_ms': 23184.138536453247,
INFO 01-29 03:51:24 [omni.py:860]              'avg_time_per_request_ms': 23184.138536453247,
INFO 01-29 03:51:24 [omni.py:860]              'avg_tokens_per_s': 0.0}],
INFO 01-29 03:51:24 [omni.py:860]  'transfers': []}
Total generation time: 23.1862 seconds (23186.17 ms)
INFO 01-29 03:51:24 [image_edit.py:431] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_e9e033ee-cf9a-4919-9f53-a6db5c67ceea', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt={'prompt': "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'", 'negative_prompt': None, 'multi_modal_data': {'image': <PIL.Image.Image image mode=RGB size=832x1056 at 0x7FF8E13332C0>}, 'additional_information': {'global_request_id': ['0_e9e033ee-cf9a-4919-9f53-a6db5c67ceea'], 'preprocessed_image': tensor([[[[ 1.0000,  0.9843,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9922,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9765,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           ...,
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9529, -1.0000, -0.9765],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9922, -1.0000, -0.9922],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -1.0000, -0.9765, -1.0000]],
INFO 01-29 03:51:24 [image_edit.py:431] 
INFO 01-29 03:51:24 [image_edit.py:431]          [[ 1.0000,  0.9843,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9922,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9765,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           ...,
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9529, -1.0000, -0.9765],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9922, -1.0000, -0.9922],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -1.0000, -0.9765, -1.0000]],
INFO 01-29 03:51:24 [image_edit.py:431] 
INFO 01-29 03:51:24 [image_edit.py:431]          [[ 1.0000,  0.9843,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9922,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 0.9765,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  1.0000],
INFO 01-29 03:51:24 [image_edit.py:431]           ...,
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9529, -1.0000, -0.9765],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -0.9922, -1.0000, -0.9922],
INFO 01-29 03:51:24 [image_edit.py:431]           [ 1.0000,  1.0000,  1.0000,  ..., -1.0000, -0.9765, -1.0000]]]]), 'prompt_image': [<PIL.Image.Image image mode=RGB size=832x1056 at 0x7FF8E1333350>]}}, latents=None, metrics={}, multimodal_output={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]
Saved edited image to /home/dyvm6xra/dyvm6xrauser49/project/GLM-Image_edit.png
[Stage-0] INFO 01-29 03:51:25 [omni_stage.py:779] Received shutdown signal
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 814ce4a4e6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 491 to 495
condition_grid = image_grid_thw[:-1]
prior_token_image_embed = self.vision_language_encoder.get_image_features(
inputs["pixel_values"], condition_grid
)
).pooler_output
prior_token_image_embed = torch.cat(prior_token_image_embed, dim=0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove torch.cat on pooled image features tensor

With the new .pooler_output access, prior_token_image_embed is a single tensor (the pooled image features). torch.cat(prior_token_image_embed, dim=0) now raises TypeError: cat() received an invalid combination of arguments because torch.cat requires a sequence of tensors, not a tensor. This will crash image-edit requests that include condition images (the only path where this block runs). Consider using the tensor directly (or wrapping it in a list only if you truly need to concatenate multiple tensors).

Useful? React with 👍 / 👎.

@tzhouam tzhouam added the ready label to trigger buildkite CI label Jan 29, 2026
@david6666666
Copy link
Collaborator

LGTM

@david6666666 david6666666 merged commit def3956 into vllm-project:main Jan 29, 2026
7 checks passed
dongbo910220 pushed a commit to dongbo910220/vllm-omni that referenced this pull request Feb 1, 2026
Co-authored-by: root <root@hk01dgx028.cm.cluster>
@tzhouam tzhouam deleted the dev/debug_glm_image branch February 23, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: zai-org/GLM-Image image_to_image error, ValueError: not enough values to unpack (expected 4, got 3)

2 participants