Skip to content

[Bug]: Qwen2.5-Omni, req rate=1, Send mix (image+video+audio) requests, after sending over a dozen requests, a torch.AcceleratorError: CUDA error occurred, causing the service to exit abnormally. #476

@yenuo26

Description

@yenuo26

Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

Steps to Reproduce

1.Start the service (vllm version: 0.12.0, vllm_omni: main d51f4c0 )
2.Run the following command:
stage_args:

  • stage_id: 0
    runtime:
    process: true # Run this stage in a separate process
    devices: "0" # Visible devices for this stage (CUDA_VISIBLE_DEVICES/torch.cuda.set_device)
    max_batch_size: 50
    engine_args:
    model_stage: thinker
    model_arch: Qwen2_5OmniForConditionalGeneration
    worker_cls: vllm_omni.worker.gpu_ar_worker.GPUARWorker
    scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
    max_model_len: 896
    max_num_batched_tokens: 896
    max_num_seqs: 1
    gpu_memory_utilization: 0.8
    skip_mm_profiling: true
    enforce_eager: true # Now we only support eager mode
    trust_remote_code: true
    engine_output_type: latent
    enable_prefix_caching: false
    is_comprehension: true
    final_output: true
    final_output_type: text
    default_sampling_params:
    temperature: 0.0
    top_p: 1.0
    top_k: -1
    max_tokens: 128
    seed: 42
    detokenize: True
    repetition_penalty: 1.1
  • stage_id: 1
    runtime:
    process: true
    devices: "1"
    max_batch_size: 50
    engine_args:
    model_stage: talker
    model_arch: Qwen2_5OmniForConditionalGeneration
    worker_cls: vllm_omni.worker.gpu_ar_worker.GPUARWorker
    scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
    max_model_len: 896
    max_num_batched_tokens: 896
    max_num_seqs: 1
    gpu_memory_utilization: 0.8
    skip_mm_profiling: true
    enforce_eager: true
    trust_remote_code: true
    enable_prefix_caching: false
    engine_output_type: latent
    engine_input_source: [0]
    custom_process_input_func: vllm_omni.model_executor.stage_input_processors.qwen2_5_omni.thinker2talker
    default_sampling_params:
    temperature: 0.9
    top_p: 0.8
    top_k: 40
    max_tokens: 128
    seed: 42
    detokenize: True
    repetition_penalty: 1.05
    stop_token_ids: [8294]
  • stage_id: 2
    runtime:
    process: true
    devices: "0" # Example: use a different GPU than the previous stage; use "0" if single GPU
    max_batch_size: 50
    engine_args:
    model_stage: code2wav
    model_arch: Qwen2_5OmniForConditionalGeneration
    worker_cls: vllm_omni.worker.gpu_generation_worker.GPUGenerationWorker
    scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
    gpu_memory_utilization: 0.15
    enforce_eager: true
    trust_remote_code: true
    enable_prefix_caching: false
    engine_output_type: audio
    engine_input_source: [1]
    final_output: true
    final_output_type: audio
    default_sampling_params:
    temperature: 0.0
    top_p: 1.0
    top_k: -1
    max_tokens: 128
    seed: 42
    detokenize: True
    repetition_penalty: 1.1

3.Run the benchmark command:
vllm bench serve --omni --dataset-name random-mm --port  48569 --model xxxx/Qwen2.5-Omni-7B --endpoint /v1/chat/completions --backend openai-chat --request-rate 1 --num-prompts 100 --random-input-len 10 --random-range-ratio 0.0 --random-mm-base-items-per-request 2 --random-mm-num-mm-items-range-ratio 0 --random-mm-limit-mm-per-prompt '{"image":1,"video":1, "audio": 1}' --random-mm-bucket-config '{"(32, 32, 1)": 0.5, "(0, 1, 1)": 0.1, "(32, 32, 2)":0.4}' --ignore-eos --percentile-metrics ttft,tpot,itl,e2el --random-output-len 2 --ready-check-timeout-sec 100

Server Error


(EngineCore_DP0 pid=29327) INFO 12-25 11:38:12 [qwen2_5_omni.py:911] Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.

/pytorch/aten/src/ATen/native/cuda/IndexKernelUtils.cu:16: vectorized_gather_kernel: block: [28,0,0], thread: [63,0,0] Assertion `ind >=0 && ind < ind_dim_size && "vectorized gather kernel index out of bounds"` failed.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [dump_input.py:72] Dumping input data for V1 LLM engine (v0.12.0) with config: model='/ms_test2/models/Qwen2.5-Omni-7B', speculative_config=None, tokenizer='/ms_test2/models/Qwen2.5-Omni-7B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01), seed=0, served_model_name=/ms_test2/models/Qwen2.5-Omni-7B, enable_prefix_caching=False, enable_chunked_prefill=False, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>}, 'local_cache_dir': None},
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [dump_input.py:79] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-bench-4ce01ab2-19,prompt_token_ids_len=128,mm_features=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.1, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=42, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=(),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None)], scheduled_cached_reqs=CachedRequestData(req_ids=[], resumed_req_ids=[], new_token_ids=[], all_token_ids={}, new_block_ids=[], num_computed_tokens=[], num_output_tokens=[]), num_scheduled_tokens={chatcmpl-bench-4ce01ab2-19: 128}, total_num_scheduled_tokens=128, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[], finished_req_ids=[], free_encoder_mm_hashes=[], preempted_req_ids=[], pending_structured_output_tokens=false, kv_connector_metadata=null, ec_connector_metadata=null)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [dump_input.py:81] Dumping scheduler stats: SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0, preempted_requests=0, preempted_queries=0, preempted_hits=0), connector_prefix_cache_stats=None, kv_cache_eviction_events=[], spec_decoding_stats=None, kv_connector_stats=None, waiting_lora_adapters={}, running_lora_adapters={})
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] Traceback (most recent call last):
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 836, in run_engine_core
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     engine_core.run_busy_loop()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 863, in run_busy_loop
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     self._process_engine_step()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 892, in _process_engine_step
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 346, in step
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     model_output = future.result()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                    ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return self.__get_result()
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     raise self._exception
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 79, in collective_rpc
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 369, in execute_model
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return self.worker.execute_model(scheduler_output, *args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 591, in execute_model
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     output = self.model_runner.execute_model(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/worker/gpu_generation_model_runner.py", line 132, in execute_model
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     outputs = self._run_generation_model(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/worker/gpu_generation_model_runner.py", line 216, in _run_generation_model
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return self.model.forward(**kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 315, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     audio_tensor = self.generate_audio(code, voice_type)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 511, in generate_audio
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     audio_tensor = self._codec_to_audio(code_tensor, voice_type)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 931, in _codec_to_audio
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     _, audio_chunk = self.token2wav.process_chunk(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1820, in process_chunk
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     _mel, out = self.process_little_chunk(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1800, in process_little_chunk
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     mel = self.token2wav(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]           ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1468, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     mel_spectrogram = self.code2wav_dit_model.sample(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1311, in sample
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     solution_trajectory = ode_solver.integrate(time_embedding)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1140, in integrate
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     delta_value, _ = self._compute_step(self.function, time_start, time_step, time_end, current_value)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1106, in _compute_step
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     function_value_start = function(time_start, value_start)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1287, in ode_function
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     model_output = self(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                    ^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1243, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     hidden_states = transformer_block(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                     ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 643, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     attn_output = self.attn(
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                   ^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 571, in forward
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     query[:, :1], key[:, :1] = apply_rotary_pos_emb(query[:, :1], key[:, :1], cos, sin)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 526, in apply_rotary_pos_emb
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]     k_embed = (k * cos) + (rotate_half_codec(k) * sin)
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]               ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] torch.AcceleratorError: CUDA error: device-side assert triggered
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=29327) ERROR 12-25 11:38:12 [core.py:845]
(EngineCore_DP0 pid=29327) Process EngineCore_DP0:
ERROR 12-25 11:38:12 [async_llm.py:546] AsyncLLM output_handler failed.
ERROR 12-25 11:38:12 [async_llm.py:546] Traceback (most recent call last):
ERROR 12-25 11:38:12 [async_llm.py:546]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 498, in output_handler
ERROR 12-25 11:38:12 [async_llm.py:546]     outputs = await engine_core.get_output_async()
ERROR 12-25 11:38:12 [async_llm.py:546]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-25 11:38:12 [async_llm.py:546]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 885, in get_output_async
ERROR 12-25 11:38:12 [async_llm.py:546]     raise self._format_exception(outputs) from None
ERROR 12-25 11:38:12 [async_llm.py:546] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_DP0 pid=29327) Traceback (most recent call last):
(EngineCore_DP0 pid=29327)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=29327)     self.run()
(EngineCore_DP0 pid=29327)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=29327)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 847, in run_engine_core
(EngineCore_DP0 pid=29327)     raise e
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 836, in run_engine_core
(EngineCore_DP0 pid=29327)     engine_core.run_busy_loop()
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 863, in run_busy_loop
(EngineCore_DP0 pid=29327)     self._process_engine_step()
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 892, in _process_engine_step
(EngineCore_DP0 pid=29327)     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=29327)                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 346, in step
(EngineCore_DP0 pid=29327)     model_output = future.result()
(EngineCore_DP0 pid=29327)                    ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore_DP0 pid=29327)     return self.__get_result()
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=29327)     raise self._exception
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 79, in collective_rpc
(EngineCore_DP0 pid=29327)     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=29327)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=29327)     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 369, in execute_model
(EngineCore_DP0 pid=29327)     return self.worker.execute_model(scheduler_output, *args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327)     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 591, in execute_model
(EngineCore_DP0 pid=29327)     output = self.model_runner.execute_model(
(EngineCore_DP0 pid=29327)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327)     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/worker/gpu_generation_model_runner.py", line 132, in execute_model
(EngineCore_DP0 pid=29327)     outputs = self._run_generation_model(
(EngineCore_DP0 pid=29327)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/worker/gpu_generation_model_runner.py", line 216, in _run_generation_model
(EngineCore_DP0 pid=29327)     return self.model.forward(**kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 315, in forward
(EngineCore_DP0 pid=29327)     audio_tensor = self.generate_audio(code, voice_type)
(EngineCore_DP0 pid=29327)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 511, in generate_audio
(EngineCore_DP0 pid=29327)     audio_tensor = self._codec_to_audio(code_tensor, voice_type)
(EngineCore_DP0 pid=29327)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py", line 931, in _codec_to_audio
(EngineCore_DP0 pid=29327)     _, audio_chunk = self.token2wav.process_chunk(
(EngineCore_DP0 pid=29327)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327)     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1820, in process_chunk
(EngineCore_DP0 pid=29327)     _mel, out = self.process_little_chunk(
(EngineCore_DP0 pid=29327)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-25 11:38:12 [omni_stage.py:1278] [Stage-2] Failed on request chatcmpl-bench-4ce01ab2-19: EngineCore encountered an issue. See stack trace (above) for the root cause.
ERROR 12-25 11:38:12 [omni_stage.py:1278] Traceback (most recent call last):
ERROR 12-25 11:38:12 [omni_stage.py:1278]   File "/workspace/vllm-omni-test/vllm_omni/entrypoints/omni_stage.py", line 1184, in _stage_worker_async
ERROR 12-25 11:38:12 [omni_stage.py:1278]     async for res in stage_engine.generate(ein, sampling_params, rid):
ERROR 12-25 11:38:12 [omni_stage.py:1278]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 444, in generate
ERROR 12-25 11:38:12 [omni_stage.py:1278]     out = q.get_nowait() or await q.get()
ERROR 12-25 11:38:12 [omni_stage.py:1278]                             ^^^^^^^^^^^^^
ERROR 12-25 11:38:12 [omni_stage.py:1278]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/output_processor.py", line 70, in get
ERROR 12-25 11:38:12 [omni_stage.py:1278]     raise output
ERROR 12-25 11:38:12 [omni_stage.py:1278]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 498, in output_handler
ERROR 12-25 11:38:12 [omni_stage.py:1278]     outputs = await engine_core.get_output_async()
ERROR 12-25 11:38:12 [omni_stage.py:1278]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-25 11:38:12 [omni_stage.py:1278]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 885, in get_output_async
ERROR 12-25 11:38:12 [omni_stage.py:1278]     raise self._format_exception(outputs) from None
ERROR 12-25 11:38:12 [omni_stage.py:1278] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327)     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1800, in process_little_chunk
(EngineCore_DP0 pid=29327)     mel = self.token2wav(
(EngineCore_DP0 pid=29327)           ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1468, in forward
(EngineCore_DP0 pid=29327)     mel_spectrogram = self.code2wav_dit_model.sample(
(EngineCore_DP0 pid=29327)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=29327)     return func(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1311, in sample
(EngineCore_DP0 pid=29327)     solution_trajectory = ode_solver.integrate(time_embedding)
(EngineCore_DP0 pid=29327)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1140, in integrate
(EngineCore_DP0 pid=29327)     delta_value, _ = self._compute_step(self.function, time_start, time_step, time_end, current_value)
(EngineCore_DP0 pid=29327)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1106, in _compute_step
(EngineCore_DP0 pid=29327)     function_value_start = function(time_start, value_start)
(EngineCore_DP0 pid=29327)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1287, in ode_function
(EngineCore_DP0 pid=29327)     model_output = self(
(EngineCore_DP0 pid=29327)                    ^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 1243, in forward
(EngineCore_DP0 pid=29327)     hidden_states = transformer_block(
(EngineCore_DP0 pid=29327)                     ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 643, in forward
(EngineCore_DP0 pid=29327)     attn_output = self.attn(
(EngineCore_DP0 pid=29327)                   ^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=29327)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=29327)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=29327)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 571, in forward
(EngineCore_DP0 pid=29327)     query[:, :1], key[:, :1] = apply_rotary_pos_emb(query[:, :1], key[:, :1], cos, sin)
(EngineCore_DP0 pid=29327)                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=29327)   File "/workspace/vllm-omni-test/vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni_token2wav.py", line 526, in apply_rotary_pos_emb
(EngineCore_DP0 pid=29327)     k_embed = (k * cos) + (rotate_half_codec(k) * sin)
(EngineCore_DP0 pid=29327)               ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(EngineCore_DP0 pid=29327) torch.AcceleratorError: CUDA error: device-side assert triggered
(EngineCore_DP0 pid=29327) Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(EngineCore_DP0 pid=29327) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=29327) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=29327) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=29327)
(APIServer pid=27249) ERROR 12-25 11:38:12 [async_omni.py:390] Stage 2 error on request chatcmpl-bench-4ce01ab2-19: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=27249) INFO:     127.0.0.1:56822 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[rank0]:[W1225 11:38:13.228179462 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
--------------------------------
[Stage-0] Received batch size=1, request_ids=chatcmpl-bench-4ce01ab2-22
--------------------------------
--------------------------------
[Stage-0] Received batch size=1, request_ids=chatcmpl-bench-4ce01ab2-23
--------------------------------
--------------------------------
[Stage-1] Received batch size=1, request_ids=chatcmpl-bench-4ce01ab2-14
--------------------------------
--------------------------------
[Stage-2] Received batch size=1, request_ids=chatcmpl-bench-4ce01ab2-10
--------------------------------
ERROR 12-25 11:38:15 [omni_stage.py:1278] [Stage-2] Failed on request chatcmpl-bench-4ce01ab2-10: EngineCore encountered an issue. See stack trace (above) for the root cause.
ERROR 12-25 11:38:15 [omni_stage.py:1278] Traceback (most recent call last):
ERROR 12-25 11:38:15 [omni_stage.py:1278]   File "/workspace/vllm-omni-test/vllm_omni/entrypoints/omni_stage.py", line 1184, in _stage_worker_async
ERROR 12-25 11:38:15 [omni_stage.py:1278]     async for res in stage_engine.generate(ein, sampling_params, rid):
ERROR 12-25 11:38:15 [omni_stage.py:1278]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 426, in generate
ERROR 12-25 11:38:15 [omni_stage.py:1278]     q = await self.add_request(
ERROR 12-25 11:38:15 [omni_stage.py:1278]         ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-25 11:38:15 [omni_stage.py:1278]   File "/workspace/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 296, in add_request
ERROR 12-25 11:38:15 [omni_stage.py:1278]     raise EngineDeadError()
ERROR 12-25 11:38:15 [omni_stage.py:1278] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=27249) ERROR 12-25 11:38:15 [async_omni.py:390] Stage 2 error on request chatcmpl-bench-4ce01ab2-10: EngineCore encountered an issue. See stack trace (above) for the root cause.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions