[Misc] Add stage config for Qwen3-Omni-30B-A3B-Thinking#172
[Misc] Add stage config for Qwen3-Omni-30B-A3B-Thinking#172ywang96 merged 6 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| engine_output_type: text | ||
| distributed_executor_backend: "mp" | ||
| enable_prefix_caching: false | ||
| hf_config_name: thinker_config |
There was a problem hiding this comment.
Avoid nested thinker_config for Thinking checkpoints
This YAML sets hf_config_name: thinker_config, which makes OmniModelConfig.draw_hf_text_config (vllm_omni/config/model.py:79-85) dereference hf_config.thinker_config before building the model. The Qwen3-Omni-*Thinking checkpoints you are targeting only ship the thinker config itself (Qwen3OmniMoeThinkerConfig) and do not wrap it in a thinker_config attribute, so loading this stage file against those models will raise AttributeError and the config cannot be used. Drop the hf_config_name indirection (and use the thinker architecture) so thinker-only checkpoints load successfully.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actually I do have a question - looks like right now we're using model_type for huggingface to identify the stage config yaml.
vllm-omni/vllm_omni/entrypoints/utils.py
Lines 41 to 47 in 574e1fb
How does this work for this model? qwen3_omni_moe_thinking isn't a valid model_type right? https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Thinking/blob/main/config.json#L10
i add a small check in the utils.py. would that work? |
Gaohan123
left a comment
There was a problem hiding this comment.
Is it possible to use think mode to take end2end generation for audio?
vllm_omni/entrypoints/utils.py
Outdated
| # (no talker/code2wav configs) but reuse the base qwen3_omni_moe model_type. | ||
| # Detect this using multiple hints so users don't need to manually rewrite | ||
| # the stage config path. | ||
| is_qwen3_omni_moe_thinking = ( |
There was a problem hiding this comment.
Is it possible to set up just in stage config? Here it is a little bit model specific in general utils.
There was a problem hiding this comment.
If we only add the YAML without this routing logic, vLLM will automatically pick qwen3_omni_moe.yaml due to the shared model_type. The user would then be forced to explicitly pass --stage-config vllm_omni/.../qwen3_omni_moe_thinking.yaml every time.
I understand your concern about polluting utils.py with model-specific code. Could you point me to a better place to insert this auto-detection?"
There was a problem hiding this comment.
I think it is totally ok to add a custom config file in examples. After all, the folder stage_configs is just for default setting.
There was a problem hiding this comment.
I have moved it to examples folder.
Gaohan123
left a comment
There was a problem hiding this comment.
I think it is good. Please use git commit -s to pass the DCO check. Then I will help to merge. Thanks!
|
@Gaohan123 I have added DCO sign-offs. Thanks! |
Add a single-stage configuration example for Qwen3-Omni-MoE-Thinking models that only have the thinker component (text-only output, no audio synthesis). Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
6543b74 to
0f87094
Compare
…#172) Signed-off-by: linyueqian <[email protected]> Signed-off-by: Prajwal A <[email protected]>
…#172) Signed-off-by: linyueqian <[email protected]> Signed-off-by: Prajwal A <[email protected]>
…#172) Signed-off-by: linyueqian <[email protected]> Signed-off-by: Fanli Lin <[email protected]>
…#172) Signed-off-by: linyueqian <[email protected]>
Purpose
Add a single-stage configuration example for Qwen3-Omni-MoE-Thinking models (e.g., Qwen3-Omni-30B-A3B-Thinking) that only
have the thinker component and produce text-only output (no audio synthesis).
Test Plan
N/A (config file only)
Test Result
Verified on 2x H200 GPUs with tensor_parallel_size=2.