Skip to content

Comments

[Debug] Correct Unreasonable Long Timeout#1175

Merged
david6666666 merged 1 commit intovllm-project:mainfrom
tzhouam:dev/reduce-timeout
Feb 3, 2026
Merged

[Debug] Correct Unreasonable Long Timeout#1175
david6666666 merged 1 commit intovllm-project:mainfrom
tzhouam:dev/reduce-timeout

Conversation

@tzhouam
Copy link
Collaborator

@tzhouam tzhouam commented Feb 3, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Currently, some timeouts are unreasonably long (e.g., 60,000 seconds, or approx. 16.7 hours). This appears to be a unit mismatch between seconds and milliseconds. We have identified and updated these thresholds in the online serving and diffusion RPC configurations.

Test Plan

vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8092

Test Result

vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8092
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] 
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325]        █     █     █▄   ▄█
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.15.0
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325]   █▄█▀ █     █     █     █  model   Qwen/Qwen3-Omni-30B-A3B-Instruct
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] 
(APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:261] non-default args: {'model_tag': 'Qwen/Qwen3-Omni-30B-A3B-Instruct', 'port': 8092, 'model': 'Qwen/Qwen3-Omni-30B-A3B-Instruct'}
(APIServer pid=40636) INFO 02-03 07:16:48 [omni.py:119] Initializing stages for model: Qwen/Qwen3-Omni-30B-A3B-Instruct
(APIServer pid=40636) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_interleaved', 'mrope_section'}
(APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_section'}
(APIServer pid=40636) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_interleaved', 'mrope_section'}
(APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_section'}
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:215] Auto-configuring SharedMemoryConnector for edge ('0', '1')
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:215] Auto-configuring SharedMemoryConnector for edge ('1', '2')
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:234] Loaded OmniTransferConfig with 2 connector configurations
(APIServer pid=40636) INFO 02-03 07:16:50 [factory.py:46] Created connector: SharedMemoryConnector
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:60] Created connector for 0 -> 1: SharedMemoryConnector
(APIServer pid=40636) INFO 02-03 07:16:50 [factory.py:46] Created connector: SharedMemoryConnector
(APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:60] Created connector for 1 -> 2: SharedMemoryConnector
(APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'llm', 'runtime': {'devices': '0', 'max_batch_size': 64}, 'engine_args': {'model_stage': 'thinker', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'ar', 'scheduler_cls': 'vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler', 'gpu_memory_utilization': 0.9, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'latent', 'distributed_executor_backend': 'mp', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'hf_config_name': 'thinker_config', 'tensor_parallel_size': 1, 'max_num_seqs': 64, 'async_chunk': False}, 'final_output': True, 'final_output_type': 'text', 'is_comprehension': True, 'default_sampling_params': {'temperature': 0.4, 'top_p': 0.9, 'top_k': 1, 'max_tokens': 2048, 'seed': 42, 'detokenize': True, 'repetition_penalty': 1.05}}
(APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 1, 'stage_type': 'llm', 'runtime': {'devices': '1', 'max_batch_size': 64}, 'engine_args': {'model_stage': 'talker', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'ar', 'scheduler_cls': 'vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler', 'gpu_memory_utilization': 0.6, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'latent', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'distributed_executor_backend': 'mp', 'hf_config_name': 'talker_config', 'max_num_seqs': 64, 'async_chunk': False}, 'engine_input_source': [0], 'custom_process_input_func': 'vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker', 'default_sampling_params': {'temperature': 0.9, 'top_k': 50, 'max_tokens': 4096, 'seed': 42, 'detokenize': False, 'repetition_penalty': 1.05, 'stop_token_ids': [2150]}}
(APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 2, 'stage_type': 'llm', 'runtime': {'devices': '1', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'code2wav', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'generation', 'scheduler_cls': 'vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler', 'enforce_eager': True, 'trust_remote_code': True, 'async_scheduling': False, 'enable_prefix_caching': False, 'engine_output_type': 'audio', 'gpu_memory_utilization': 0.1, 'distributed_executor_backend': 'mp', 'max_num_batched_tokens': 1000000, 'hf_config_name': 'thinker_config', 'max_num_seqs': 1, 'async_chunk': False}, 'engine_input_source': [1], 'custom_process_input_func': 'vllm_omni.model_executor.stage_input_processors.qwen3_omni.talker2code2wav', 'final_output': True, 'final_output_type': 'audio', 'default_sampling_params': {'temperature': 0.0, 'top_p': 1.0, 'top_k': -1, 'max_tokens': 65536, 'seed': 42, 'detokenize': True, 'repetition_penalty': 1.1}}
(APIServer pid=40636) INFO 02-03 07:16:50 [omni.py:338] [AsyncOrchestrator] Waiting for 3 stages to initialize (timeout: 600s)
[Stage-2] INFO 02-03 07:20:20 [initialization.py:288] [Stage-2] Initializing OmniConnectors with config keys: ['from_stage_1']
[Stage-2] INFO 02-03 07:20:20 [factory.py:46] Created connector: SharedMemoryConnector

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
@tzhouam tzhouam added the ready label to trigger buildkite CI label Feb 3, 2026
@david6666666 david6666666 enabled auto-merge (squash) February 3, 2026 08:01
@david6666666 david6666666 merged commit d6f93b0 into vllm-project:main Feb 3, 2026
7 checks passed
futurenitian pushed a commit to futurenitian/vllm-omni that referenced this pull request Feb 4, 2026
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: future fu <3172516720@qq.com>
futurenitian pushed a commit to futurenitian/vllm-omni that referenced this pull request Feb 4, 2026
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: future fu <3172516720@qq.com>
YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants