[Bug]: Error retrieving safetensors: Repo id must be in the form 'repo_name' and 400 Bad Request

### Your current environment

<details>
<summary>当前环境

vllm      version 0.14.0  
vllm-omni version 0.14.0rc1
Name: torch
Version: 2.9.1

</summary>


```text
vllm      version 0.14.0
vllm-omni version 0.14.0rc1
```

</details>



### Your code version

<details>
<summary>The commit id or version of vllm</summary>

```text
0.14.0
```
</details>
<details>
<summary>The commit id or version of vllm-omni</summary>

```text
0.14.0rc1
```
</details>


### 🐛 Describe the bug

### 启动命令 /data/LLM/Qwen3-TTS-12Hz-1.7B-CustomVoice 本地路径
vllm-omni serve /data/LLM/Qwen3-TTS-12Hz-1.7B-CustomVoice \
    --served-model-name Qwen3-TTS-12Hz-1.7B-CustomVoice    \
    --stage-configs-path qwen3_tts.yaml \
    --host 0.0.0.0 \
    --port 8800 \
    --gpu-memory-utilization 0.5 \
    --trust-remote-code \
    --enforce-eager \
    --omni

### qwen3_tts.yaml文件内容
stage_args:
  - stage_id: 0
    stage_type: llm                                   # Use llm stage type to launch OmniLLM
    runtime:
      devices: "0"
      max_batch_size: 1
    engine_args:
      model_stage: qwen3_tts
      model_arch: Qwen3TTSForConditionalGeneration
      worker_cls: vllm_omni.worker.gpu_generation_worker.GPUGenerationWorker
      scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
      enforce_eager: true
      trust_remote_code: true
      async_scheduling: false
      enable_prefix_caching: false
      engine_output_type: audio                      # Final output: audio waveform
      gpu_memory_utilization: 0.1
      distributed_executor_backend: "mp"
      max_num_batched_tokens: 10000

    final_output: true
    final_output_type: audio

### 启动的时候报错  但是还是能启动
[Stage-0] WARNING 01-29 09:58:53 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
[Stage-0] INFO 01-29 09:58:54 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
[Stage-0] INFO 01-29 09:58:54 [configuration_qwen3_tts.py:489] talker_config is None. Initializing talker model with default values
[Stage-0] INFO 01-29 09:58:54 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
[Stage-0] INFO 01-29 09:58:54 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
[Stage-0] INFO 01-29 09:58:54 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
[Stage-0] INFO 01-29 09:59:04 [model.py:530] Resolved architecture: Qwen3TTSForConditionalGeneration
[Stage-0] ERROR 01-29 09:59:04 [repo_utils.py:65] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/data/LLM/Qwen3-TTS-12Hz-1.7B-CustomVoice'. Use `repo_type` argument if needed., retrying 1 of 2
[Stage-0] ERROR 01-29 09:59:06 [repo_utils.py:63] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/data/LLM/Qwen3-TTS-12Hz-1.7B-CustomVoice'. Use `repo_type` argument if needed.

### 推理的时候报错
python openai_speech_client.py \    
    --text "今天天气真好" \    
     --voice Ryan \    
     --instructions "用开心的语气说"

### 报错400 返回错误信息
{"error":{"message":"1 validation error:\n  {'type': 'literal_error', 'loc': ('body', 'voice'), 'msg': \"Input should be 'alloy', 'ash', 'ballad', 'coral', 'echo', 'fable', 'onyx', 'nova', 'sage', 'shimmer' or 'verse'\", 'input': 'Ryan', 'ctx': {'expected': \"'alloy', 'ash', 'ballad', 'coral', 'echo', 'fable', 'onyx', 'nova', 'sage', 'shimmer' or 'verse'\"}}\n\n  File \"/root/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/utils.py\", line 709, in create_speech\n    POST /v1/audio/speech [{'type': 'literal_error', 'loc': ('body', 'voice'), 'msg': \"Input should be 'alloy', 'ash', 'ballad', 'coral', 'echo', 'fable', 'onyx', 'nova', 'sage', 'shimmer' or 'verse'\", 'input': 'Ryan', 'ctx': {'expected': \"'alloy', 'ash', 'ballad', 'coral', 'echo', 'fable', 'onyx', 'nova', 'sage', 'shimmer' or 'verse'\"}}]","type":"Bad Request","param":null,"code":400}}


### 我自己写的一个其他的测试能200 ok但是wav音频对不上
```
import requests

url = "http://localhost:8800/v1/audio/speech"

data = {
    "input": "你好在吗",
    "voice": "alloy",
    "response_format": "wav"
}

response = requests.post(url, json=data)

if response.status_code == 200:
    with open("test_vllm.wav", "wb") as f:
        f.write(response.content)
```


请问是我启动命令的不规范 还是用的不对

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Error retrieving safetensors: Repo id must be in the form 'repo_name' and 400 Bad Request #1041

Your current environment

Your code version

🐛 Describe the bug

启动命令 /data/LLM/Qwen3-TTS-12Hz-1.7B-CustomVoice 本地路径

qwen3_tts.yaml文件内容

启动的时候报错但是还是能启动

推理的时候报错

报错400 返回错误信息

我自己写的一个其他的测试能200 ok但是wav音频对不上

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Error retrieving safetensors: Repo id must be in the form 'repo_name' and 400 Bad Request #1041

Description

Your current environment

Your code version

🐛 Describe the bug

启动命令 /data/LLM/Qwen3-TTS-12Hz-1.7B-CustomVoice 本地路径

qwen3_tts.yaml文件内容

启动的时候报错 但是还是能启动

推理的时候报错

报错400 返回错误信息

我自己写的一个其他的测试能200 ok但是wav音频对不上

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

启动的时候报错但是还是能启动