-
Notifications
You must be signed in to change notification settings - Fork 512
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
Steps to Reproduce
1.Start the service (vllm version: 0.12.0, vllm_omni: main d51f4c0 )
2.Run the following command:
stage_args:
- stage_id: 0
runtime:
process: true # Run this stage in a separate process
devices: "0" # Visible devices for this stage (CUDA_VISIBLE_DEVICES/torch.cuda.set_device)
max_batch_size: 50
engine_args:
model_stage: thinker
model_arch: Qwen2_5OmniForConditionalGeneration
worker_cls: vllm_omni.worker.gpu_ar_worker.GPUARWorker
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
max_model_len: 896
max_num_batched_tokens: 896
max_num_seqs: 1
gpu_memory_utilization: 0.8
skip_mm_profiling: true
enforce_eager: true # Now we only support eager mode
trust_remote_code: true
engine_output_type: latent
enable_prefix_caching: false
is_comprehension: true
final_output: true
final_output_type: text
default_sampling_params:
temperature: 0.0
top_p: 1.0
top_k: -1
max_tokens: 128
seed: 42
detokenize: True
repetition_penalty: 1.1 - stage_id: 1
runtime:
process: true
devices: "1"
max_batch_size: 50
engine_args:
model_stage: talker
model_arch: Qwen2_5OmniForConditionalGeneration
worker_cls: vllm_omni.worker.gpu_ar_worker.GPUARWorker
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
max_model_len: 896
max_num_batched_tokens: 896
max_num_seqs: 1
gpu_memory_utilization: 0.8
skip_mm_profiling: true
enforce_eager: true
trust_remote_code: true
enable_prefix_caching: false
engine_output_type: latent
engine_input_source: [0]
custom_process_input_func: vllm_omni.model_executor.stage_input_processors.qwen2_5_omni.thinker2talker
default_sampling_params:
temperature: 0.9
top_p: 0.8
top_k: 40
max_tokens: 128
seed: 42
detokenize: True
repetition_penalty: 1.05
stop_token_ids: [8294] - stage_id: 2
runtime:
process: true
devices: "0" # Example: use a different GPU than the previous stage; use "0" if single GPU
max_batch_size: 50
engine_args:
model_stage: code2wav
model_arch: Qwen2_5OmniForConditionalGeneration
worker_cls: vllm_omni.worker.gpu_generation_worker.GPUGenerationWorker
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
gpu_memory_utilization: 0.15
enforce_eager: true
trust_remote_code: true
enable_prefix_caching: false
engine_output_type: audio
engine_input_source: [1]
final_output: true
final_output_type: audio
default_sampling_params:
temperature: 0.0
top_p: 1.0
top_k: -1
max_tokens: 128
seed: 42
detokenize: True
repetition_penalty: 1.1
3.Run the benchmark command:
vllm bench serve --omni --dataset-name random-mm --port 48569 --model xxxx/Qwen2.5-Omni-7B --endpoint /v1/chat/completions --backend openai-chat --request-rate 1 --num-prompts 100 --random-input-len 10 --random-range-ratio 0.0 --random-mm-base-items-per-request 2 --random-mm-num-mm-items-range-ratio 0 --random-mm-limit-mm-per-prompt '{"image":1,"video":1, "audio": 1}' --random-mm-bucket-config '{"(32, 32, 1)": 0.5, "(0, 1, 1)": 0.1, "(32, 32, 2)":0.4}' --ignore-eos --percentile-metrics ttft,tpot,itl,e2el --random-output-len 2 --ready-check-timeout-sec 100
Server Error
(EngineCore_DP0 pid=29327) INFO 12-25 11:38:12 [qwen2_5_omni.py:911] Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
/pytorch/aten/src/ATen/native/cuda/IndexKernelUtils.cu:16: vectorized_gather_kernel: block: [28,0,0], thread: [63,0,0] Assertion `ind >=0 && ind < ind_dim_size && "vectorized gather kernel index out of bounds"` failed.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [dump_input.py:72] Dumping input data for V1 LLM engine (v0.12.0) with config: model='/ms_test2/models/Qwen2.5-Omni-7B', speculative_config=None, tokenizer='/ms_test2/models/Qwen2.5-Omni-7B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01), seed=0, served_model_name=/ms_test2/models/Qwen2.5-Omni-7B, enable_prefix_caching=False, enable_chunked_prefill=False, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>}, 'local_cache_dir': None},
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [dump_input.py:79] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-bench-4ce01ab2-19,prompt_token_ids_len=128,mm_features=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.1, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=42, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=(),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None)], scheduled_cached_reqs=CachedRequestData(req_ids=[], resumed_req_ids=[], new_token_ids=[], all_token_ids={}, new_block_ids=[], num_computed_tokens=[], num_output_tokens=[]), num_scheduled_tokens={chatcmpl-bench-4ce01ab2-19: 128}, total_num_scheduled_tokens=128, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[], finished_req_ids=[], free_encoder_mm_hashes=[], preempted_req_ids=[], pending_structured_output_tokens=false, kv_connector_metadata=null, ec_connector_metadata=null)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [dump_input.py:81] Dumping scheduler stats: SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0, preempted_requests=0, preempted_queries=0, preempted_hits=0), connector_prefix_cache_stats=None, kv_cache_eviction_events=[], spec_decoding_stats=None, kv_connector_stats=None, waiting_lora_adapters={}, running_lora_adapters={})
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] Traceback (most recent call last):
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 836, in run_engine_core
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] engine_core.run_busy_loop()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 863, in run_busy_loop
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] self._process_engine_step()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 892, in _process_engine_step
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 346, in step
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] model_output = future.result()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return self.__get_result()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] raise self._exception
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 79, in collective_rpc
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 369, in execute_model
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return self.worker.execute_model(scheduler_output, *args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 591, in execute_model
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] output = self.model_runner.execute_model(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/worker/gpu_generation_model_runner.py", line 132, in execute_model
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] outputs = self._run_generation_model(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/worker/gpu_generation_model_runner.py", line 216, in _run_generation_model
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return self.model.forward(**kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 315, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] audio_tensor = self.generate_audio(code, voice_type)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 511, in generate_audio
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] audio_tensor = self._codec_to_audio(code_tensor, voice_type)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 931, in _codec_to_audio
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] _, audio_chunk = self.token2wav.process_chunk(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1820, in process_chunk
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] _mel, out = self.process_little_chunk(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1800, in process_little_chunk
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] mel = self.token2wav(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1468, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] mel_spectrogram = self.code2wav_dit_model.sample(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1311, in sample
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] solution_trajectory = ode_solver.integrate(time_embedding)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1140, in integrate
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] delta_value, _ = self._compute_step(self.function, time_start, time_step, time_end, current_value)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1106, in _compute_step
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] function_value_start = function(time_start, value_start)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1287, in ode_function
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] model_output = self(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1243, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] hidden_states = transformer_block(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 643, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] attn_output = self.attn(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 571, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] query[:, :1], key[:, :1] = apply_rotary_pos_emb(query[:, :1], key[:, :1], cos, sin)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 526, in apply_rotary_pos_emb
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] k_embed = (k * cos) + (rotate_half_codec(k) * sin)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] torch.AcceleratorError: CUDA error: device-side assert triggered
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]
(EngineCore_DP0 pid=29327) Process EngineCore_DP0:
ERROR 12-25 11:38:12 [async_llm.py:546] AsyncLLM output_handler failed.
ERROR 12-25 11:38:12 [async_llm.py:546] Traceback (most recent call last):
ERROR 12-25 11:38:12 [async_llm.py:546] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 498, in output_handler
ERROR 12-25 11:38:12 [async_llm.py:546] outputs = await engine_core.get_output_async()
ERROR 12-25 11:38:12 [async_llm.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-25 11:38:12 [async_llm.py:546] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 885, in get_output_async
ERROR 12-25 11:38:12 [async_llm.py:546] raise self._format_exception(outputs) from None
ERROR 12-25 11:38:12 [async_llm.py:546] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_DP0 pid=29327) Traceback (most recent call last):
(EngineCore_DP0 pid=29327) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=29327) self.run()
(EngineCore_DP0 pid=29327) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=29327) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 847, in run_engine_core
(EngineCore_DP0 pid=29327) raise e
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 836, in run_engine_core
(EngineCore_DP0 pid=29327) engine_core.run_busy_loop()
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 863, in run_busy_loop
(EngineCore_DP0 pid=29327) self._process_engine_step()
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 892, in _process_engine_step
(EngineCore_DP0 pid=29327) outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 346, in step
(EngineCore_DP0 pid=29327) model_output = future.result()
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore_DP0 pid=29327) return self.__get_result()
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=29327) raise self._exception
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 79, in collective_rpc
(EngineCore_DP0 pid=29327) result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=29327) return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 369, in execute_model
(EngineCore_DP0 pid=29327) return self.worker.execute_model(scheduler_output, *args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 591, in execute_model
(EngineCore_DP0 pid=29327) output = self.model_runner.execute_model(
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/worker/gpu_generation_model_runner.py", line 132, in execute_model
(EngineCore_DP0 pid=29327) outputs = self._run_generation_model(
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/worker/gpu_generation_model_runner.py", line 216, in _run_generation_model
(EngineCore_DP0 pid=29327) return self.model.forward(**kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 315, in forward
(EngineCore_DP0 pid=29327) audio_tensor = self.generate_audio(code, voice_type)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 511, in generate_audio
(EngineCore_DP0 pid=29327) audio_tensor = self._codec_to_audio(code_tensor, voice_type)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 931, in _codec_to_audio
(EngineCore_DP0 pid=29327) _, audio_chunk = self.token2wav.process_chunk(
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1820, in process_chunk
(EngineCore_DP0 pid=29327) _mel, out = self.process_little_chunk(
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-25 11:38:12 [omni_stage.py:1278] [Stage-2] Failed on request chatcmpl-bench-4ce01ab2-19: EngineCore encountered an issue. See stack trace (above) for the root cause.
ERROR 12-25 11:38:12 [omni_stage.py:1278] Traceback (most recent call last):
ERROR 12-25 11:38:12 [omni_stage.py:1278] File "/workspace/vllm-omni-test/vllm_omni/entrypoints/omni_stage.py", line 1184, in _stage_worker_async
ERROR 12-25 11:38:12 [omni_stage.py:1278] async for res in stage_engine.generate(ein, sampling_params, rid):
ERROR 12-25 11:38:12 [omni_stage.py:1278] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 444, in generate
ERROR 12-25 11:38:12 [omni_stage.py:1278] out = q.get_nowait() or await q.get()
ERROR 12-25 11:38:12 [omni_stage.py:1278] ^^^^^^^^^^^^^
ERROR 12-25 11:38:12 [omni_stage.py:1278] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/output_processor.py", line 70, in get
ERROR 12-25 11:38:12 [omni_stage.py:1278] raise output
ERROR 12-25 11:38:12 [omni_stage.py:1278] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 498, in output_handler
ERROR 12-25 11:38:12 [omni_stage.py:1278] outputs = await engine_core.get_output_async()
ERROR 12-25 11:38:12 [omni_stage.py:1278] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-25 11:38:12 [omni_stage.py:1278] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 885, in get_output_async
ERROR 12-25 11:38:12 [omni_stage.py:1278] raise self._format_exception(outputs) from None
ERROR 12-25 11:38:12 [omni_stage.py:1278] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1800, in process_little_chunk
(EngineCore_DP0 pid=29327) mel = self.token2wav(
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1468, in forward
(EngineCore_DP0 pid=29327) mel_spectrogram = self.code2wav_dit_model.sample(
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1311, in sample
(EngineCore_DP0 pid=29327) solution_trajectory = ode_solver.integrate(time_embedding)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1140, in integrate
(EngineCore_DP0 pid=29327) delta_value, _ = self._compute_step(self.function, time_start, time_step, time_end, current_value)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1106, in _compute_step
(EngineCore_DP0 pid=29327) function_value_start = function(time_start, value_start)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1287, in ode_function
(EngineCore_DP0 pid=29327) model_output = self(
(EngineCore_DP0 pid=29327) ^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1243, in forward
(EngineCore_DP0 pid=29327) hidden_states = transformer_block(
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 643, in forward
(EngineCore_DP0 pid=29327) attn_output = self.attn(
(EngineCore_DP0 pid=29327) ^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 571, in forward
(EngineCore_DP0 pid=29327) query[:, :1], key[:, :1] = apply_rotary_pos_emb(query[:, :1], key[:, :1], cos, sin)
(EngineCore_DP0 pid=29327) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 526, in apply_rotary_pos_emb
(EngineCore_DP0 pid=29327) k_embed = (k * cos) + (rotate_half_codec(k) * sin)
(EngineCore_DP0 pid=29327) ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(EngineCore_DP0 pid=29327) torch.AcceleratorError: CUDA error: device-side assert triggered
(EngineCore_DP0 pid=29327) Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(EngineCore_DP0 pid=29327) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=29327) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=29327) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=29327)
(APIServer pid=27249) ERROR 12-25 11:38:12 [async_omni.py:390] Stage 2 error on request chatcmpl-bench-4ce01ab2-19: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=27249) INFO: 127.0.0.1:56822 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[rank0]:[W1225 11:38:13.228179462 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
--------------------------------
[Stage-0] Received batch size=1, request_ids=chatcmpl-bench-4ce01ab2-22
--------------------------------
--------------------------------
[Stage-0] Received batch size=1, request_ids=chatcmpl-bench-4ce01ab2-23
--------------------------------
--------------------------------
[Stage-1] Received batch size=1, request_ids=chatcmpl-bench-4ce01ab2-14
--------------------------------
--------------------------------
[Stage-2] Received batch size=1, request_ids=chatcmpl-bench-4ce01ab2-10
--------------------------------
ERROR 12-25 11:38:15 [omni_stage.py:1278] [Stage-2] Failed on request chatcmpl-bench-4ce01ab2-10: EngineCore encountered an issue. See stack trace (above) for the root cause.
ERROR 12-25 11:38:15 [omni_stage.py:1278] Traceback (most recent call last):
ERROR 12-25 11:38:15 [omni_stage.py:1278] File "/workspace/vllm-omni-test/vllm_omni/entrypoints/omni_stage.py", line 1184, in _stage_worker_async
ERROR 12-25 11:38:15 [omni_stage.py:1278] async for res in stage_engine.generate(ein, sampling_params, rid):
ERROR 12-25 11:38:15 [omni_stage.py:1278] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 426, in generate
ERROR 12-25 11:38:15 [omni_stage.py:1278] q = await self.add_request(
ERROR 12-25 11:38:15 [omni_stage.py:1278] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-25 11:38:15 [omni_stage.py:1278] File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 296, in add_request
ERROR 12-25 11:38:15 [omni_stage.py:1278] raise EngineDeadError()
ERROR 12-25 11:38:15 [omni_stage.py:1278] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=27249) ERROR 12-25 11:38:15 [async_omni.py:390] Stage 2 error on request chatcmpl-bench-4ce01ab2-10: EngineCore encountered an issue. See stack trace (above) for the root cause.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working