[Debug] Correct Unreasonable Long Timeout by tzhouam · Pull Request #1175 · vllm-project/vllm-omni

tzhouam · 2026-02-03T07:51:21Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Currently, some timeouts are unreasonably long (e.g., 60,000 seconds, or approx. 16.7 hours). This appears to be a unit mismatch between seconds and milliseconds. We have identified and updated these thresholds in the online serving and diffusion RPC configurations.

Test Plan

vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8092

Test Result

vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8092
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] 
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325]        █     █     █▄   ▄█
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.15.0
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325]   █▄█▀ █     █     █     █  model   Qwen/Qwen3-Omni-30B-A3B-Instruct
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] 
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:261] non-default args: {'model_tag': 'Qwen/Qwen3-Omni-30B-A3B-Instruct', 'port': 8092, 'model': 'Qwen/Qwen3-Omni-30B-A3B-Instruct'}
(APIServer pid=40636) INFO 02-03 07:16:48 [omni.py:119] Initializing stages for model: Qwen/Qwen3-Omni-30B-A3B-Instruct
(APIServer pid=40636) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_interleaved', 'mrope_section'}
(APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_section'}
(APIServer pid=40636) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_interleaved', 'mrope_section'}
(APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_section'}
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:215] Auto-configuring SharedMemoryConnector for edge ('0', '1')
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:215] Auto-configuring SharedMemoryConnector for edge ('1', '2')
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:234] Loaded OmniTransferConfig with 2 connector configurations
(APIServer pid=40636) INFO 02-03 07:16:50 [factory.py:46] Created connector: SharedMemoryConnector
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:60] Created connector for 0 -> 1: SharedMemoryConnector
(APIServer pid=40636) INFO 02-03 07:16:50 [factory.py:46] Created connector: SharedMemoryConnector
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:60] Created connector for 1 -> 2: SharedMemoryConnector
(APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'llm', 'runtime': {'devices': '0', 'max_batch_size': 64}, 'engine_args': {'model_stage': 'thinker', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'ar', 'scheduler_cls': 'vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler', 'gpu_memory_utilization': 0.9, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'latent', 'distributed_executor_backend': 'mp', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'hf_config_name': 'thinker_config', 'tensor_parallel_size': 1, 'max_num_seqs': 64, 'async_chunk': False}, 'final_output': True, 'final_output_type': 'text', 'is_comprehension': True, 'default_sampling_params': {'temperature': 0.4, 'top_p': 0.9, 'top_k': 1, 'max_tokens': 2048, 'seed': 42, 'detokenize': True, 'repetition_penalty': 1.05}}
(APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 1, 'stage_type': 'llm', 'runtime': {'devices': '1', 'max_batch_size': 64}, 'engine_args': {'model_stage': 'talker', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'ar', 'scheduler_cls': 'vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler', 'gpu_memory_utilization': 0.6, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'latent', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'distributed_executor_backend': 'mp', 'hf_config_name': 'talker_config', 'max_num_seqs': 64, 'async_chunk': False}, 'engine_input_source': [0], 'custom_process_input_func': 'vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker', 'default_sampling_params': {'temperature': 0.9, 'top_k': 50, 'max_tokens': 4096, 'seed': 42, 'detokenize': False, 'repetition_penalty': 1.05, 'stop_token_ids': [2150]}}
(APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 2, 'stage_type': 'llm', 'runtime': {'devices': '1', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'code2wav', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'generation', 'scheduler_cls': 'vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler', 'enforce_eager': True, 'trust_remote_code': True, 'async_scheduling': False, 'enable_prefix_caching': False, 'engine_output_type': 'audio', 'gpu_memory_utilization': 0.1, 'distributed_executor_backend': 'mp', 'max_num_batched_tokens': 1000000, 'hf_config_name': 'thinker_config', 'max_num_seqs': 1, 'async_chunk': False}, 'engine_input_source': [1], 'custom_process_input_func': 'vllm_omni.model_executor.stage_input_processors.qwen3_omni.talker2code2wav', 'final_output': True, 'final_output_type': 'audio', 'default_sampling_params': {'temperature': 0.0, 'top_p': 1.0, 'top_k': -1, 'max_tokens': 65536, 'seed': 42, 'detokenize': True, 'repetition_penalty': 1.1}}
(APIServer pid=40636) INFO 02-03 07:16:50 [omni.py:338] [AsyncOrchestrator] Waiting for 3 stages to initialize (timeout: 600s)
[Stage-2] INFO 02-03 07:20:20 [initialization.py:288] [Stage-2] Initializing OmniConnectors with config keys: ['from_stage_1']
[Stage-2] INFO 02-03 07:20:20 [factory.py:46] Created connector: SharedMemoryConnector

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: future fu <3172516720@qq.com>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

correct unreasonable long timeout

0fc804c

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

tzhouam requested a review from hsliuustc0106 as a code owner February 3, 2026 07:51

tzhouam added the ready label to trigger buildkite CI label Feb 3, 2026

david6666666 approved these changes Feb 3, 2026

View reviewed changes

david6666666 enabled auto-merge (squash) February 3, 2026 08:01

david6666666 merged commit d6f93b0 into vllm-project:main Feb 3, 2026
7 checks passed

futurenitian pushed a commit to futurenitian/vllm-omni that referenced this pull request Feb 4, 2026

[Debug] Correct Unreasonable Long Timeout (vllm-project#1175)

1a636a3

Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: future fu <3172516720@qq.com>

futurenitian pushed a commit to futurenitian/vllm-omni that referenced this pull request Feb 4, 2026

[Debug] Correct Unreasonable Long Timeout (vllm-project#1175)

35e5cca

Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: future fu <3172516720@qq.com>

YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026

[Debug] Correct Unreasonable Long Timeout (vllm-project#1175)

b807e33

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Debug] Correct Unreasonable Long Timeout#1175

[Debug] Correct Unreasonable Long Timeout#1175
david6666666 merged 1 commit intovllm-project:mainfrom
tzhouam:dev/reduce-timeout

tzhouam commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

tzhouam commented Feb 3, 2026

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants