Skip to content

Dev/rebase 0.14.0 and Support GLM-Image#847

Merged
hsliuustc0106 merged 59 commits intomainfrom
dev/rebase_0.14.0
Jan 20, 2026
Merged

Dev/rebase 0.14.0 and Support GLM-Image#847
hsliuustc0106 merged 59 commits intomainfrom
dev/rebase_0.14.0

Conversation

@tzhouam
Copy link
Collaborator

@tzhouam tzhouam commented Jan 19, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims to rebase to vllm 0.14.0

Test Qwen 2.5 Omni

python3 openai_chat_completion_client_for_multimodal_generation.py -q mixed_modalities
Chat completion output from text: The audio recites "Mary had a little lamb". The image shows a baby with glasses reading a book on a bed.The video might be funny because it's so unexpected to see a baby wearing glasses and actually reading a book. It gives off a really cute and adorable vibe.What do you think about it? Do you have any other thoughts or questions?

audio_0.wav

Test Qwen 3 Omni

python3 openai_chat_completion_client_for_multimodal_generation.py -q use_mixed_modalities
Chat completion output from text: Based on the provided audio and images, here is an analysis of your questions:

### 1. What is recited in the audio?

The speaker recites the first verse of the classic English nursery rhyme "Mary Had a Little Lamb":

> "Mary had a little lamb,
> Its fleece was white as snow.
> And everywhere that Mary went,
> The lamb was sure to go."

This is followed by the speaker's comment, "A little piece of practical poetry."

### 2. What is the content of this image?

The image shows a low-angle view of the Tokyo Skytree tower against a clear blue sky. In the foreground, there are out-of-focus pink cherry blossoms (sakura) on tree branches, creating a beautiful springtime scene.

### 3. Why is this video funny?

The humor in the video comes from the juxtaposition of two completely unrelated scenes:

*   **The Audio:** It features a man speaking in a formal, historical tone about the "original phonograph" and reciting a children's poem.
*   **The Video:** It shows a very young child, wearing large glasses, sitting on a bed and intently flipping through the pages of a book.

The comedy arises because the serious, adult-oriented narration does not match the simple, innocent, and slightly absurd sight of a baby "reading." The contrast between the grand, historical subject matter and the mundane, adorable activity of a toddler creates a surreal and humorous effect.
Audio saved to audio_0.wav

audio_0.wav


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

JaredforReal and others added 30 commits January 8, 2026 17:55
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
… Qwen3 Omni Thinker is not finished)

Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: tzhouam <[email protected]>
…pport AR Offiline) (#763)

Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: root <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Co-authored-by: JaredforReal <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: tzhouam <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
@tzhouam tzhouam added the ready label to trigger buildkite CI label Jan 19, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52d20a7858

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +186 to +190
if tool_parsing_unavailable and request.tool_choice not in (
None,
"none",
):
# for hf tokenizers, "auto" tools requires
# --enable-auto-tool-choice and --tool-call-parser
return self.create_error_response(
'"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set'
)
if request.tool_choice == "auto" and not self.enable_auto_tools:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject auto tool choice when parser missing

In OmniOpenAIServingChat.create_chat_completion, the new validation allows tool_choice="auto" to proceed when tool_parsing_unavailable is true as long as --enable-auto-tool-choice is set, but it never checks that a tool_parser exists. In the common non‑Mistral, non‑Harmony path, this means requests with tool_choice="auto" and no --tool-call-parser no longer error and will later skip tool parsing (all tool extraction paths are gated on self.tool_parser), so the API silently returns plain text instead of tool calls. This is a behavior regression for users who enable auto tools but forget to configure a parser.

Useful? React with 👍 / 👎.

@david6666666 david6666666 added the high priority high priority issue, needs to be done asap label Jan 20, 2026
Signed-off-by: tzhouam <[email protected]>

input_ids = self.input_ids.gpu[:num_input_tokens]
inputs_embeds = self.inputs_embeds.gpu[:num_input_tokens]
input_ids, inputs_embeds = self._prepare_mm_inputs(num_input_tokens)
Copy link
Contributor

@gcanlin gcanlin Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that input_ids = self.input_ids.gpu[:num_input_tokens] was added newly in vLLM-Omni. But if replace it with _prepare_mm_inputs, then input_ids will only be not None when self.model.requires_raw_input_tokens is true. Could this apply in Qwen-Omni?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both models which requires input ids have set the requires_raw_input_tokens=True

Copy link
Contributor

@gcanlin gcanlin Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. You set it in the modeling file. Thanks for explaining!

Signed-off-by: tzhouam <[email protected]>
@ZJY0516 ZJY0516 mentioned this pull request Jan 20, 2026
5 tasks
@@ -21,21 +25,19 @@ class GPUGenerationWorker(GPUWorker):
"""

def init_device(self):
Copy link
Contributor

@gcanlin gcanlin Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly use super.init_device() here? It seems that it's totally same as upstream and we only need self.model_runner = GPUGenerationModelRunner(self.vllm_config, self.device).

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit 776c3a7 into main Jan 20, 2026
7 checks passed
@david6666666 david6666666 changed the title Dev/rebase 0.14.0 Dev/rebase 0.14.0 and Support GLM-Image Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority high priority issue, needs to be done asap ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants