Dev/rebase 0.14.0 and Support GLM-Image#847
Conversation
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
…e_prior_tokens Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: root <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
… Qwen3 Omni Thinker is not finished) Signed-off-by: tzhouam <[email protected]>
…RequestState is not finished) Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
…rmat Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
…pport AR Offiline) (#763) Signed-off-by: JaredforReal <[email protected]> Signed-off-by: root <[email protected]> Signed-off-by: tzhouam <[email protected]> Co-authored-by: JaredforReal <[email protected]> Co-authored-by: root <[email protected]>
…y from race condition Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: tzhouam <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 52d20a7858
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if tool_parsing_unavailable and request.tool_choice not in ( | ||
| None, | ||
| "none", | ||
| ): | ||
| # for hf tokenizers, "auto" tools requires | ||
| # --enable-auto-tool-choice and --tool-call-parser | ||
| return self.create_error_response( | ||
| '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set' | ||
| ) | ||
| if request.tool_choice == "auto" and not self.enable_auto_tools: |
There was a problem hiding this comment.
Reject auto tool choice when parser missing
In OmniOpenAIServingChat.create_chat_completion, the new validation allows tool_choice="auto" to proceed when tool_parsing_unavailable is true as long as --enable-auto-tool-choice is set, but it never checks that a tool_parser exists. In the common non‑Mistral, non‑Harmony path, this means requests with tool_choice="auto" and no --tool-call-parser no longer error and will later skip tool parsing (all tool extraction paths are gated on self.tool_parser), so the API silently returns plain text instead of tool calls. This is a behavior regression for users who enable auto tools but forget to configure a parser.
Useful? React with 👍 / 👎.
Signed-off-by: tzhouam <[email protected]>
…llm-omni into dev/rebase-0.14.0 Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
|
|
||
| input_ids = self.input_ids.gpu[:num_input_tokens] | ||
| inputs_embeds = self.inputs_embeds.gpu[:num_input_tokens] | ||
| input_ids, inputs_embeds = self._prepare_mm_inputs(num_input_tokens) |
There was a problem hiding this comment.
Notice that input_ids = self.input_ids.gpu[:num_input_tokens] was added newly in vLLM-Omni. But if replace it with _prepare_mm_inputs, then input_ids will only be not None when self.model.requires_raw_input_tokens is true. Could this apply in Qwen-Omni?
There was a problem hiding this comment.
Yes, both models which requires input ids have set the requires_raw_input_tokens=True
There was a problem hiding this comment.
Oh. You set it in the modeling file. Thanks for explaining!
Signed-off-by: tzhouam <[email protected]>
| @@ -21,21 +25,19 @@ class GPUGenerationWorker(GPUWorker): | |||
| """ | |||
|
|
|||
| def init_device(self): | |||
There was a problem hiding this comment.
Why not directly use super.init_device() here? It seems that it's totally same as upstream and we only need self.model_runner = GPUGenerationModelRunner(self.vllm_config, self.device).
Signed-off-by: root <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: Zhou Taichang <[email protected]>
Signed-off-by: Zhou Taichang <[email protected]>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR aims to rebase to vllm 0.14.0
Test Qwen 2.5 Omni
audio_0.wav
Test Qwen 3 Omni
audio_0.wav
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)