Skip to content

[Misc] Fix error log for the diffusion stage timeout#915

Merged
david6666666 merged 3 commits intovllm-project:mainfrom
SamitHuang:fix_log
Jan 23, 2026
Merged

[Misc] Fix error log for the diffusion stage timeout#915
david6666666 merged 3 commits intovllm-project:mainfrom
SamitHuang:fix_log

Conversation

@SamitHuang
Copy link
Collaborator

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Currently, we usually get these error log when running a new diffusion model, due to the model downloading time > stage init timeout.

WARNING 01-23 02:09:33 [omni.py:326] [Orchestrator] Initialization timeout: 0/1 stages ready. Missing stages: [0]
ERROR 01-23 02:09:33 [omni.py:340] [Orchestrator] Stage initialization failed. Troubleshooting Steps:
ERROR 01-23 02:09:33 [omni.py:340]   1) Verify GPU/device assignment in config (runtime.devices) is correct.
ERROR 01-23 02:09:33 [omni.py:340]   2) Check GPU/host memory availability; reduce model or batch size if needed.
ERROR 01-23 02:09:33 [omni.py:340]   3) Check model weights path and network reachability (if loading remotely).
ERROR 01-23 02:09:33 [omni.py:340]   4) Increase initialization wait time (stage_init_timeout or call-site timeout).

However, it doesn't affect the final success generation.

INFO 01-23 02:30:36 [omni.py:816]  'e2e_total_time_ms': 1263149.7077941895,
INFO 01-23 02:30:36 [omni.py:816]  'e2e_sum_time_ms': 1263148.5142707825,
INFO 01-23 02:30:36 [omni.py:816]  'e2e_total_tokens': 0,
INFO 01-23 02:30:36 [omni.py:816]  'e2e_avg_time_per_request_ms': 1263148.5142707825,
INFO 01-23 02:30:36 [omni.py:816]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-23 02:30:36 [omni.py:816]  'wall_time_ms': 1263149.7077941895,
INFO 01-23 02:30:36 [omni.py:816]  'final_stage_id': {'0_47432604-f7dd-46b9-b6db-d718915e4a62': 0},
INFO 01-23 02:30:36 [omni.py:816]  'stages': [{'stage_id': 0,
INFO 01-23 02:30:36 [omni.py:816]              'requests': 1,
INFO 01-23 02:30:36 [omni.py:816]              'tokens': 0,
INFO 01-23 02:30:36 [omni.py:816]              'total_time_ms': 1263149.003982544,
INFO 01-23 02:30:36 [omni.py:816]              'avg_time_per_request_ms': 1263149.003982544,
INFO 01-23 02:30:36 [omni.py:816]              'avg_tokens_per_s': 0.0}],
INFO 01-23 02:30:36 [omni.py:816]  'transfers': []}
Adding requests:   0%|                                                                                                                                             | 0/1 [21:03<?, ?it/s]
Saved generated video to i2v_output.mp4
[Stage-0] INFO 01-23 02:30:47 [omni_stage.py:722] Received shutdown signal
[Stage-0] INFO 01-23 02:30:47 [gpu_diffusion_worker.py:315] Worker 0: Received shutdown message
[Stage-0] INFO 01-23 02:30:47 [gpu_diffusion_worker.py:336] event loop terminated.
[Stage-0] INFO 01-23 02:30:47 [gpu_diffusion_worker.py:364] Worker 0: Shutdown complete.

This PR change the init fail error message to timeout warning to avoid confusion.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: samithuang <[email protected]>
Co-authored-by: Jiangyun Zhu <[email protected]>
Signed-off-by: Hongsheng Liu <[email protected]>
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 23, 2026
@david6666666 david6666666 merged commit 5d9918f into vllm-project:main Jan 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

4 participants