Support Qwen3 tts online serving by linyueqian · Pull Request #968 · vllm-project/vllm-omni

linyueqian · 2026-01-26T14:13:30Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR adds online serving support for Qwen3-TTS models via the /v1/audio/speech endpoint, addressing Task 1 from RFC #938.

The implementation extends the existing OpenAI-compatible speech API to support Qwen3-TTS specific parameters:

CustomVoice: Predefined speaker voices (Vivian, Ryan, etc.) with optional style instructions
VoiceDesign: Natural language voice description
Base: Voice cloning from reference audio

Key changes:

Extended OpenAICreateSpeechRequest with Qwen3-TTS parameters (task_type, language, ref_audio, ref_text, x_vector_only_mode)
Updated serving_speech.py to handle Qwen3-TTS prompt format and additional_information
Fixed scalar tensor serialization issue for audio sample rate
Fixed @check_model_inputs decorator compatibility issue (I could not pass the warmup if adding this line)
Added example client and documentation

Test Plan

1. CustomVoice Task

# Start server
vllm-omni serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
    --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
    --trust-remote-code \
    --enforce-eager \
    --omni

# Test Chinese
curl -X POST http://localhost:8000/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d '{
        "input": "其实我真的有发现，我是一个特别善于观察别人情绪的人。",
        "voice": "Vivian",
        "language": "Chinese",
        "instructions": "用特别愤怒的语气说"
    }' --output output_customvoice_chinese.wav

# Test English
curl -X POST http://localhost:8000/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d '{
        "input": "She said she would be here by noon.",
        "voice": "Ryan",
        "language": "English",
        "instructions": "Very happy."
    }' --output output_customvoice_english.wav

2. VoiceDesign Task

# Start server
vllm-omni serve Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
    --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
    --trust-remote-code \
    --enforce-eager \
    --omni

# Test VoiceDesign
curl -X POST http://localhost:8000/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d '{
        "input": "哥哥，你回来啦，人家等了你好久好久了，要抱抱！",
        "task_type": "VoiceDesign",
        "language": "Chinese",
        "instructions": "体现撒娇稚嫩的萝莉女声，音调偏高且起伏明显，营造出黏人、做作又刻意卖萌的听觉效果。"
    }' --output output_voicedesign.wav

3. Base Task (Voice Clone)

# Start server
vllm-omni serve Qwen/Qwen3-TTS-12Hz-1.7B-Base \
    --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
    --trust-remote-code \
    --enforce-eager \
    --omni

# Test voice cloning
curl -X POST http://localhost:8000/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d '{
        "input": "Good one. Okay, fine, I am just gonna leave this sock monkey here. Goodbye.",
        "task_type": "Base",
        "language": "Auto",
        "ref_audio": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav",
        "ref_text": "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you.",
        "x_vector_only_mode": false
    }' --output output_base_clone.wav

Test Result

All three task types successfully generate audio output:

CustomVoice:
output_customvoice_chinese.wav
output_customvoice_english.wav

VoiceDesign:
output_voicedesign.wav

Base (Voice Clone):
output_base_clone.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: linyueqian <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18ee8361c3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/entrypoints/openai/serving_speech.py

Signed-off-by: linyueqian <[email protected]>

Gaohan123

Thanks for the timely work. Overall it is clear and makes sense. Please resolve several left comments for your reference. Thanks!

examples/online_serving/qwen3_tts/README.md

Gaohan123 · 2026-01-26T16:41:10Z

vllm_omni/entrypoints/openai/serving_speech.py

        try:
-            prompt = {"prompt": request.input}
+            # Check if this is a Qwen3-TTS model
+            if self._is_tts_model():


Here the logic seems a bit confusing. _is_tts_model looks like all tts models should match it. But it is actually just for Qwen3-TTS, and else branch for other TTS models. In my view, we can generally check if a request includes certain parameter and process it. If there is not a certain parameter, we just skip it. That might be easier for understanding and more generalized.

I will revise it to keep model detection only for prompt format (i think this is model-specific) while making param handling generalized.

I think the updated version primarily makes sense. I just thought about it. There is another way, in OmniOpenAIServingSpeech, let self.engine_client maintains a required input params for each loaded model. Then here you can generally check if each request includes a certain param required without model specific implementations. Of course this needs more work. If it is a bit heavy, we can also leave it as a new PR.

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106 · 2026-01-26T23:39:32Z

vllm_omni/entrypoints/openai/serving_speech.py

+

 class OmniOpenAIServingSpeech(OpenAIServing, AudioMixin):
+    def _is_tts_model(self) -> bool:


is this a valid check for tts_model_stage? I think we are going to split qwen3_tts into 2 stages and the model_stage will be changed accordingly

if the stage names are changing, this check will break. is there a better way to detect Qwen3-TTS models?

vllm_omni/entrypoints/openai/serving_speech.py

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106 · 2026-01-27T00:26:46Z

Comment on Pull Request #968

I would like to suggest that we implement parameter validations as follows:

Voice Field Change: The voice field was modified from a strict Literal with specific allowed values to an unrestricted str | None, which has removed validation for allowed voice names such as Vivian, Ryan, etc.
Missing Task_type Dependency Validation: There is currently no validation ensuring that the Base task has the required ref_audio parameter, nor a check confirming that the CustomVoice task is provided with a valid voice parameter. The task type can be set, but parameter requirements aren’t enforced.
Language Field Validation: The language field accepts any arbitrary string without validation for supported languages like Chinese, English, Japanese, Korean, or Auto.
Ref_Audio Format Validation: The ref_audio format lacks validation to confirm whether it’s a valid URL, file path, or base64 encoded data. Additionally, there is no check for supported audio formats.
Cross-Parameter Validation: There is no validation ensuring that ref_text is only used with the Base task, nor that x_vector_only_mode is only meaningful with the Base task. Also, no checks are in place for conflicting parameters between different task types.
Max_New_Tokens Constraints: The max_new_tokens range does not have defined min/max constraints (unlike speed, which has ge=0.25 and le=4.0).
Empty/Null Checks: There are no checks in place to validate that input text is not empty or to enforce reasonable length limits on instructions.

Adding these validations would enhance the robustness of the API and provide clearer error messages to users, ultimately leading to a better user experience.

Signed-off-by: linyueqian <[email protected]>

linyueqian · 2026-01-27T00:34:14Z

Comment on Pull Request #968

I would like to suggest that we implement parameter validations as follows:

Voice Field Change: The voice field was modified from a strict Literal with specific allowed values to an unrestricted str | None, which has removed validation for allowed voice names such as Vivian, Ryan, etc.

Missing Task_type Dependency Validation: There is currently no validation ensuring that the Base task has the required ref_audio parameter, nor a check confirming that the CustomVoice task is provided with a valid voice parameter. The task type can be set, but parameter requirements aren’t enforced.

Language Field Validation: The language field accepts any arbitrary string without validation for supported languages like Chinese, English, Japanese, Korean, or Auto.

Ref_Audio Format Validation: The ref_audio format lacks validation to confirm whether it’s a valid URL, file path, or base64 encoded data. Additionally, there is no check for supported audio formats.

Cross-Parameter Validation: There is no validation ensuring that ref_text is only used with the Base task, nor that x_vector_only_mode is only meaningful with the Base task. Also, no checks are in place for conflicting parameters between different task types.

Max_New_Tokens Constraints: The max_new_tokens range does not have defined min/max constraints (unlike speed, which has ge=0.25 and le=4.0).

Empty/Null Checks: There are no checks in place to validate that input text is not empty or to enforce reasonable length limits on instructions.

Adding these validations would enhance the robustness of the API and provide clearer error messages to users, ultimately leading to a better user experience.

i just add them in the serving layer.

hsliuustc0106 · 2026-01-27T02:34:31Z

vllm_omni/entrypoints/openai/serving_speech.py

+

 class OmniOpenAIServingSpeech(OpenAIServing, AudioMixin):
+    def _requires_qwen3_tts_prompt(self) -> bool:


all of the names should be model agnostic

Gaohan123

Please supplement some Unit Tests to protect your newly important methods. Thanks!

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106

lgtm

Gaohan123

LGTM. Thanks for the valuable work!

adoosth · 2026-01-27T13:35:36Z

vllm_omni/entrypoints/openai/serving_speech.py

+    "Eric",
+    "Ryan",
+    "Aiden",
+    "One_Anna",


There is a typo here. Should be Ono_Anna not One_Anna

this is due to pre-commit auto fix. i will patch a fix now.

adoosth · 2026-01-27T13:37:03Z

vllm_omni/entrypoints/openai/serving_speech.py

+    "One_Anna",
+    "Sohee",
+}
+_TTS_LANGUAGES: set[str] = {"Auto", "Chinese", "English", "Japanese", "Korean"}


Acc. to https://qwen.ai/blog?id=qwen3-tts-1128 it should also support German, Italian, Portuguese, Spanish, French, and Russian

thanks! i will change this as well

Signed-off-by: linyueqian <[email protected]> Signed-off-by: jzz <[email protected]>

verigle · 2026-01-27T14:55:59Z

镜像可否更新一下？

chuanSir123 · 2026-01-28T02:54:09Z

It seems that it doesn't support concurrent processing. When running 3 concurrent tasks, they are processed sequentially, and the inference time increases linearly.
好像不支持并发哦，并发3个，都是逐个推理的，推理时间线性上升

chenchen0611 · 2026-01-28T08:12:50Z

I encountered the same concurrent processing problem

RinRin-32 · 2026-01-29T03:21:00Z

Thank you for the online serving support. I wanted to ask if we should also have a method to use a precomputed x-vec? That way voice cloning on a predetermined file repeatedly doesn't need to do speaker encoding repeatedly. If so I can work on the feature myself.

linyueqian · 2026-01-29T06:06:10Z

Thank you for the online serving support. I wanted to ask if we should also have a method to use a precomputed x-vec? That way voice cloning on a predetermined file repeatedly doesn't need to do speaker encoding repeatedly. If so I can work on the feature myself.

We don't support precomputed x-vectors in the online serving API yet. Feel free to open a PR!

Add online serving support for Qwen3-TTS models

b069c26

Signed-off-by: linyueqian <[email protected]>

linyueqian requested a review from hsliuustc0106 as a code owner January 26, 2026 14:13

Merge branch 'main' into feature/qwen3-tts-online-serving

18ee836

chatgpt-codex-connector bot reviewed Jan 26, 2026

View reviewed changes

vllm_omni/entrypoints/openai/serving_speech.py Outdated Show resolved Hide resolved

Use squeeze instead of flatten to preserve stereo channels

6160ca7

Signed-off-by: linyueqian <[email protected]>

Gaohan123 reviewed Jan 26, 2026

View reviewed changes

Gaohan123 mentioned this pull request Jan 26, 2026

[RFC]: Qwen3-TTS Production Ready - February Milestone #938

Open

david6666666 added this to the v0.14.0 milestone Jan 26, 2026

Clarify that model field is optional when serving single model

0beb544

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106 reviewed Jan 26, 2026

View reviewed changes

Refactor serving_speech for clarity and extensibility

7046c88

Signed-off-by: linyueqian <[email protected]>

linyueqian added 2 commits January 26, 2026 19:27

Add speaker validation for Qwen3-TTS CustomVoice task

b31fecf

Signed-off-by: linyueqian <[email protected]>

Add comprehensive parameter validation for Qwen3-TTS

b8d18e5

Signed-off-by: linyueqian <[email protected]>

gcanlin mentioned this pull request Jan 27, 2026

[Bugfix][Qwen3-TTS] Add compatibility for both @check_model_inputs and @check_model_inputs(). #965

Closed

5 tasks

hsliuustc0106 reviewed Jan 27, 2026

View reviewed changes

Gaohan123 reviewed Jan 27, 2026

View reviewed changes

linyueqian added 2 commits January 26, 2026 22:58

Rename TTS variables and methods to be more general

88f973f

Signed-off-by: linyueqian <[email protected]>

Add unit tests for TTS validation methods

4bc043c

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106 approved these changes Jan 27, 2026

View reviewed changes

hsliuustc0106 added the ready label to trigger buildkite CI label Jan 27, 2026

Gaohan123 approved these changes Jan 27, 2026

View reviewed changes

Gaohan123 merged commit 77ff875 into vllm-project:main Jan 27, 2026
7 checks passed

adoosth reviewed Jan 27, 2026

View reviewed changes

nussejzz pushed a commit to nussejzz/vllm-omni that referenced this pull request Jan 27, 2026

Support Qwen3 tts online serving (vllm-project#968)

85bd9a3

Signed-off-by: linyueqian <[email protected]> Signed-off-by: jzz <[email protected]>

linyueqian mentioned this pull request Jan 27, 2026

Fix TTS speaker typo and add supported languages #990

Merged

linyueqian deleted the feature/qwen3-tts-online-serving branch January 29, 2026 06:03

JuanPZuluaga mentioned this pull request Jan 29, 2026

[Bugfix][Qwen3TTS] Load speaker_id/voices from model configuration #1079

Merged

5 tasks

david6666666 mentioned this pull request Feb 6, 2026

[Bug]: Error: {"error":{"message":"TTS model did not produce audio output.","type":"BadRequestError","param":null,"code":400}} #1240

Closed

1 task



		class OmniOpenAIServingSpeech(OpenAIServing, AudioMixin):
		def _is_tts_model(self) -> bool:



		class OmniOpenAIServingSpeech(OpenAIServing, AudioMixin):
		def _requires_qwen3_tts_prompt(self) -> bool:

Conversation

linyueqian commented Jan 26, 2026

Purpose

Test Plan

1. CustomVoice Task

2. VoiceDesign Task

3. Base Task (Voice Clone)

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Jan 27, 2026

Comment on Pull Request #968

Uh oh!

linyueqian commented Jan 27, 2026

Comment on Pull Request #968

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

verigle commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chuanSir123 commented Jan 28, 2026

Uh oh!

chenchen0611 commented Jan 28, 2026

Uh oh!

RinRin-32 commented Jan 29, 2026

Uh oh!

linyueqian commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

verigle commented Jan 27, 2026 •

edited

Loading