[Debug] Correct Unreasonable Long Timeout#1175
Merged
david6666666 merged 1 commit intovllm-project:mainfrom Feb 3, 2026
Merged
[Debug] Correct Unreasonable Long Timeout#1175david6666666 merged 1 commit intovllm-project:mainfrom
david6666666 merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
david6666666
approved these changes
Feb 3, 2026
futurenitian
pushed a commit
to futurenitian/vllm-omni
that referenced
this pull request
Feb 4, 2026
Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: future fu <3172516720@qq.com>
futurenitian
pushed a commit
to futurenitian/vllm-omni
that referenced
this pull request
Feb 4, 2026
Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: future fu <3172516720@qq.com>
YanickSchraner
pushed a commit
to YanickSchraner/vllm-omni
that referenced
this pull request
Feb 20, 2026
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Currently, some timeouts are unreasonably long (e.g., 60,000 seconds, or approx. 16.7 hours). This appears to be a unit mismatch between seconds and milliseconds. We have identified and updated these thresholds in the online serving and diffusion RPC configurations.
Test Plan
Test Result
vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8092 (APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] (APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] █ █ █▄ ▄█ (APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.15.0 (APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] █▄█▀ █ █ █ █ model Qwen/Qwen3-Omni-30B-A3B-Instruct (APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:325] (APIServer pid=40636) INFO 02-03 07:16:48 [utils.py:261] non-default args: {'model_tag': 'Qwen/Qwen3-Omni-30B-A3B-Instruct', 'port': 8092, 'model': 'Qwen/Qwen3-Omni-30B-A3B-Instruct'} (APIServer pid=40636) INFO 02-03 07:16:48 [omni.py:119] Initializing stages for model: Qwen/Qwen3-Omni-30B-A3B-Instruct (APIServer pid=40636) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_interleaved', 'mrope_section'} (APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_section'} (APIServer pid=40636) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_interleaved', 'mrope_section'} (APIServer pid=40636) Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'interleaved', 'mrope_section'} (APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:215] Auto-configuring SharedMemoryConnector for edge ('0', '1') (APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:215] Auto-configuring SharedMemoryConnector for edge ('1', '2') (APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:234] Loaded OmniTransferConfig with 2 connector configurations (APIServer pid=40636) INFO 02-03 07:16:50 [factory.py:46] Created connector: SharedMemoryConnector (APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:60] Created connector for 0 -> 1: SharedMemoryConnector (APIServer pid=40636) INFO 02-03 07:16:50 [factory.py:46] Created connector: SharedMemoryConnector (APIServer pid=40636) INFO 02-03 07:16:50 [initialization.py:60] Created connector for 1 -> 2: SharedMemoryConnector (APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'llm', 'runtime': {'devices': '0', 'max_batch_size': 64}, 'engine_args': {'model_stage': 'thinker', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'ar', 'scheduler_cls': 'vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler', 'gpu_memory_utilization': 0.9, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'latent', 'distributed_executor_backend': 'mp', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'hf_config_name': 'thinker_config', 'tensor_parallel_size': 1, 'max_num_seqs': 64, 'async_chunk': False}, 'final_output': True, 'final_output_type': 'text', 'is_comprehension': True, 'default_sampling_params': {'temperature': 0.4, 'top_p': 0.9, 'top_k': 1, 'max_tokens': 2048, 'seed': 42, 'detokenize': True, 'repetition_penalty': 1.05}} (APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 1, 'stage_type': 'llm', 'runtime': {'devices': '1', 'max_batch_size': 64}, 'engine_args': {'model_stage': 'talker', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'ar', 'scheduler_cls': 'vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler', 'gpu_memory_utilization': 0.6, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'latent', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'distributed_executor_backend': 'mp', 'hf_config_name': 'talker_config', 'max_num_seqs': 64, 'async_chunk': False}, 'engine_input_source': [0], 'custom_process_input_func': 'vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker', 'default_sampling_params': {'temperature': 0.9, 'top_k': 50, 'max_tokens': 4096, 'seed': 42, 'detokenize': False, 'repetition_penalty': 1.05, 'stop_token_ids': [2150]}} (APIServer pid=40636) INFO 02-03 07:16:50 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 2, 'stage_type': 'llm', 'runtime': {'devices': '1', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'code2wav', 'model_arch': 'Qwen3OmniMoeForConditionalGeneration', 'worker_type': 'generation', 'scheduler_cls': 'vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler', 'enforce_eager': True, 'trust_remote_code': True, 'async_scheduling': False, 'enable_prefix_caching': False, 'engine_output_type': 'audio', 'gpu_memory_utilization': 0.1, 'distributed_executor_backend': 'mp', 'max_num_batched_tokens': 1000000, 'hf_config_name': 'thinker_config', 'max_num_seqs': 1, 'async_chunk': False}, 'engine_input_source': [1], 'custom_process_input_func': 'vllm_omni.model_executor.stage_input_processors.qwen3_omni.talker2code2wav', 'final_output': True, 'final_output_type': 'audio', 'default_sampling_params': {'temperature': 0.0, 'top_p': 1.0, 'top_k': -1, 'max_tokens': 65536, 'seed': 42, 'detokenize': True, 'repetition_penalty': 1.1}} (APIServer pid=40636) INFO 02-03 07:16:50 [omni.py:338] [AsyncOrchestrator] Waiting for 3 stages to initialize (timeout: 600s) [Stage-2] INFO 02-03 07:20:20 [initialization.py:288] [Stage-2] Initializing OmniConnectors with config keys: ['from_stage_1'] [Stage-2] INFO 02-03 07:20:20 [factory.py:46] Created connector: SharedMemoryConnectorEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)