Support Qwen3 tts online serving#968
Conversation
Signed-off-by: linyueqian <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 18ee8361c3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: linyueqian <[email protected]>
Gaohan123
left a comment
There was a problem hiding this comment.
Thanks for the timely work. Overall it is clear and makes sense. Please resolve several left comments for your reference. Thanks!
| try: | ||
| prompt = {"prompt": request.input} | ||
| # Check if this is a Qwen3-TTS model | ||
| if self._is_tts_model(): |
There was a problem hiding this comment.
Here the logic seems a bit confusing. _is_tts_model looks like all tts models should match it. But it is actually just for Qwen3-TTS, and else branch for other TTS models. In my view, we can generally check if a request includes certain parameter and process it. If there is not a certain parameter, we just skip it. That might be easier for understanding and more generalized.
There was a problem hiding this comment.
I will revise it to keep model detection only for prompt format (i think this is model-specific) while making param handling generalized.
There was a problem hiding this comment.
I think the updated version primarily makes sense. I just thought about it. There is another way, in OmniOpenAIServingSpeech, let self.engine_client maintains a required input params for each loaded model. Then here you can generally check if each request includes a certain param required without model specific implementations. Of course this needs more work. If it is a bit heavy, we can also leave it as a new PR.
Signed-off-by: linyueqian <[email protected]>
|
|
||
|
|
||
| class OmniOpenAIServingSpeech(OpenAIServing, AudioMixin): | ||
| def _is_tts_model(self) -> bool: |
There was a problem hiding this comment.
is this a valid check for tts_model_stage? I think we are going to split qwen3_tts into 2 stages and the model_stage will be changed accordingly
There was a problem hiding this comment.
if the stage names are changing, this check will break. is there a better way to detect Qwen3-TTS models?
Signed-off-by: linyueqian <[email protected]>
Comment on Pull Request #968I would like to suggest that we implement parameter validations as follows:
Adding these validations would enhance the robustness of the API and provide clearer error messages to users, ultimately leading to a better user experience. |
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
i just add them in the serving layer. |
|
|
||
|
|
||
| class OmniOpenAIServingSpeech(OpenAIServing, AudioMixin): | ||
| def _requires_qwen3_tts_prompt(self) -> bool: |
There was a problem hiding this comment.
all of the names should be model agnostic
Gaohan123
left a comment
There was a problem hiding this comment.
Please supplement some Unit Tests to protect your newly important methods. Thanks!
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Gaohan123
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the valuable work!
| "Eric", | ||
| "Ryan", | ||
| "Aiden", | ||
| "One_Anna", |
There was a problem hiding this comment.
There is a typo here. Should be Ono_Anna not One_Anna
There was a problem hiding this comment.
this is due to pre-commit auto fix. i will patch a fix now.
| "One_Anna", | ||
| "Sohee", | ||
| } | ||
| _TTS_LANGUAGES: set[str] = {"Auto", "Chinese", "English", "Japanese", "Korean"} |
There was a problem hiding this comment.
Acc. to https://qwen.ai/blog?id=qwen3-tts-1128 it should also support German, Italian, Portuguese, Spanish, French, and Russian
There was a problem hiding this comment.
thanks! i will change this as well
Signed-off-by: linyueqian <[email protected]> Signed-off-by: jzz <[email protected]>
|
镜像可否更新 一下? |
|
It seems that it doesn't support concurrent processing. When running 3 concurrent tasks, they are processed sequentially, and the inference time increases linearly. |
|
I encountered the same concurrent processing problem |
|
Thank you for the online serving support. I wanted to ask if we should also have a method to use a precomputed x-vec? That way voice cloning on a predetermined file repeatedly doesn't need to do speaker encoding repeatedly. If so I can work on the feature myself. |
We don't support precomputed x-vectors in the online serving API yet. Feel free to open a PR! |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR adds online serving support for Qwen3-TTS models via the
/v1/audio/speechendpoint, addressing Task 1 from RFC #938.The implementation extends the existing OpenAI-compatible speech API to support Qwen3-TTS specific parameters:
Key changes:
OpenAICreateSpeechRequestwith Qwen3-TTS parameters (task_type,language,ref_audio,ref_text,x_vector_only_mode)serving_speech.pyto handle Qwen3-TTS prompt format andadditional_information@check_model_inputsdecorator compatibility issue (I could not pass the warmup if adding this line)Test Plan
1. CustomVoice Task
2. VoiceDesign Task
3. Base Task (Voice Clone)
Test Result
All three task types successfully generate audio output:
CustomVoice:
output_customvoice_chinese.wav
output_customvoice_english.wav
VoiceDesign:
output_voicedesign.wav
Base (Voice Clone):
output_base_clone.wav
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)