Dev/rebase 0.14.0 and Support GLM-Image by tzhouam · Pull Request #847 · vllm-project/vllm-omni

tzhouam · 2026-01-19T11:34:10Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims to rebase to vllm 0.14.0

Test Qwen 2.5 Omni

python3 openai_chat_completion_client_for_multimodal_generation.py -q mixed_modalities
Chat completion output from text: The audio recites "Mary had a little lamb". The image shows a baby with glasses reading a book on a bed.The video might be funny because it's so unexpected to see a baby wearing glasses and actually reading a book. It gives off a really cute and adorable vibe.What do you think about it? Do you have any other thoughts or questions?

audio_0.wav

Test Qwen 3 Omni

python3 openai_chat_completion_client_for_multimodal_generation.py -q use_mixed_modalities
Chat completion output from text: Based on the provided audio and images, here is an analysis of your questions:

### 1. What is recited in the audio?

The speaker recites the first verse of the classic English nursery rhyme "Mary Had a Little Lamb":

> "Mary had a little lamb,
> Its fleece was white as snow.
> And everywhere that Mary went,
> The lamb was sure to go."

This is followed by the speaker's comment, "A little piece of practical poetry."

### 2. What is the content of this image?

The image shows a low-angle view of the Tokyo Skytree tower against a clear blue sky. In the foreground, there are out-of-focus pink cherry blossoms (sakura) on tree branches, creating a beautiful springtime scene.

### 3. Why is this video funny?

The humor in the video comes from the juxtaposition of two completely unrelated scenes:

*   **The Audio:** It features a man speaking in a formal, historical tone about the "original phonograph" and reciting a children's poem.
*   **The Video:** It shows a very young child, wearing large glasses, sitting on a bed and intently flipping through the pages of a book.

The comedy arises because the serious, adult-oriented narration does not match the simple, innocent, and slightly absurd sight of a baby "reading." The contrast between the grand, historical subject matter and the mundane, adorable activity of a toddler creates a surreal and humorous effect.
Audio saved to audio_0.wav

audio_0.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: JaredforReal <[email protected]>

…e_prior_tokens Signed-off-by: JaredforReal <[email protected]>

Signed-off-by: JaredforReal <[email protected]>

Signed-off-by: root <[email protected]>

Signed-off-by: tzhouam <[email protected]>

… Qwen3 Omni Thinker is not finished) Signed-off-by: tzhouam <[email protected]>

…RequestState is not finished) Signed-off-by: tzhouam <[email protected]>

Signed-off-by: tzhouam <[email protected]>

…rmat Signed-off-by: tzhouam <[email protected]>

Signed-off-by: tzhouam <[email protected]>

…pport AR Offiline) (#763) Signed-off-by: JaredforReal <[email protected]> Signed-off-by: root <[email protected]> Signed-off-by: tzhouam <[email protected]> Co-authored-by: JaredforReal <[email protected]> Co-authored-by: root <[email protected]>

…y from race condition Signed-off-by: tzhouam <[email protected]>

Signed-off-by: tzhouam <[email protected]>

Signed-off-by: JaredforReal <[email protected]>

Signed-off-by: tzhouam <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52d20a7858

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-01-19T11:40:24Z

vllm_omni/entrypoints/openai/serving_chat.py

+            if tool_parsing_unavailable and request.tool_choice not in (
+                None,
+                "none",
            ):
-                # for hf tokenizers, "auto" tools requires
-                # --enable-auto-tool-choice and --tool-call-parser
-                return self.create_error_response(
-                    '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set'
-                )
+                if request.tool_choice == "auto" and not self.enable_auto_tools:


Reject auto tool choice when parser missing

In OmniOpenAIServingChat.create_chat_completion, the new validation allows tool_choice="auto" to proceed when tool_parsing_unavailable is true as long as --enable-auto-tool-choice is set, but it never checks that a tool_parser exists. In the common non‑Mistral, non‑Harmony path, this means requests with tool_choice="auto" and no --tool-call-parser no longer error and will later skip tool parsing (all tool extraction paths are gated on self.tool_parser), so the API silently returns plain text instead of tool calls. This is a behavior regression for users who enable auto tools but forget to configure a parser.

Useful? React with 👍 / 👎.

Signed-off-by: tzhouam <[email protected]>

…llm-omni into dev/rebase-0.14.0 Signed-off-by: tzhouam <[email protected]>

Signed-off-by: tzhouam <[email protected]>

gcanlin · 2026-01-20T07:06:16Z

vllm_omni/worker/gpu_model_runner.py


-            input_ids = self.input_ids.gpu[:num_input_tokens]
-            inputs_embeds = self.inputs_embeds.gpu[:num_input_tokens]
+            input_ids, inputs_embeds = self._prepare_mm_inputs(num_input_tokens)


Notice that input_ids = self.input_ids.gpu[:num_input_tokens] was added newly in vLLM-Omni. But if replace it with _prepare_mm_inputs, then input_ids will only be not None when self.model.requires_raw_input_tokens is true. Could this apply in Qwen-Omni?

Yes, both models which requires input ids have set the requires_raw_input_tokens=True

Oh. You set it in the modeling file. Thanks for explaining!

Signed-off-by: tzhouam <[email protected]>

gcanlin · 2026-01-20T10:01:59Z

vllm_omni/worker/gpu_generation_worker.py

@@ -21,21 +25,19 @@ class GPUGenerationWorker(GPUWorker):
    """

    def init_device(self):


Why not directly use super.init_device() here? It seems that it's totally same as upstream and we only need self.model_runner = GPUGenerationModelRunner(self.vllm_config, self.device).

Signed-off-by: root <[email protected]>

Signed-off-by: tzhouam <[email protected]>

Signed-off-by: Zhou Taichang <[email protected]>

hsliuustc0106

lgtm

JaredforReal and others added 30 commits January 8, 2026 17:55

init and registry

3059e27

Signed-off-by: JaredforReal <[email protected]>

implement glm_image_transformer.py

c0a7684

Signed-off-by: JaredforReal <[email protected]>

update transformer

800cea4

Signed-off-by: JaredforReal <[email protected]>

init pipeline_glm_image.py

8664695

Signed-off-by: JaredforReal <[email protected]>

init pipeline_glm_image.py

b88b4b2

Signed-off-by: JaredforReal <[email protected]>

remove pre process

b9108f4

Signed-off-by: JaredforReal <[email protected]>

add check_input(), implement CFG parallel in diffuse(), align generat…

371afd5

…e_prior_tokens Signed-off-by: JaredforReal <[email protected]>

fix check_input(prompt_embed), add KVCache for Image Edit

3d4f5f2

Signed-off-by: JaredforReal <[email protected]>

print out vllm version

0810dae

Signed-off-by: root <[email protected]>

update model config

8e36c51

Signed-off-by: tzhouam <[email protected]>

update worker

7f704d5

Signed-off-by: tzhouam <[email protected]>

update one import in AsyncOmniLLM (not finish all, but can run)

4afb2ff

Signed-off-by: tzhouam <[email protected]>

update Qwen3 Omni ViT init based on updated interface (the update for…

cb2e053

… Qwen3 Omni Thinker is not finished) Signed-off-by: tzhouam <[email protected]>

Remove unnecessary override for OmniRequestState (the update for Omni…

e052c4a

…RequestState is not finished) Signed-off-by: tzhouam <[email protected]>

update model runner dummy run

c08dcdd

Signed-off-by: tzhouam <[email protected]>

update ar scheduler

166fc78

Signed-off-by: tzhouam <[email protected]>

update _preprocess, execute model and sample_tokens for AR Model Runner

4db8f0b

Signed-off-by: tzhouam <[email protected]>

debug AR Scheduler

63a69a5

Signed-off-by: tzhouam <[email protected]>

update OmniGPUModelRunner._update_states

5bcdb43

Signed-off-by: tzhouam <[email protected]>

update the offline LLM request sorting due to changed requested id fo…

2a0f72f

…rmat Signed-off-by: tzhouam <[email protected]>

update Qwen3 Omni to fit with the engine core logic

f7c8af9

Signed-off-by: tzhouam <[email protected]>

Merge PR #724

f12e0af

update generation model runner

e2462d2

Signed-off-by: tzhouam <[email protected]>

debug GLM-Image Model

d89e3c4

Signed-off-by: tzhouam <[email protected]>

remove deleted args from doc string

f269e0e

Signed-off-by: tzhouam <[email protected]>

disable async scheduling for generation models, avoiding inconsistenc…

2773996

…y from race condition Signed-off-by: tzhouam <[email protected]>

Update Qwen 3 Omni

b9ea754

Signed-off-by: tzhouam <[email protected]>

[Fix] GLM Image (#799)

b938725

Signed-off-by: JaredforReal <[email protected]>

support online serving for Qwen3 Omni

d28b059

Signed-off-by: tzhouam <[email protected]>

tzhouam added the ready label to trigger buildkite CI label Jan 19, 2026

chatgpt-codex-connector bot reviewed Jan 19, 2026

View reviewed changes

david6666666 added this to the v0.14.0rc1 milestone Jan 19, 2026

hsliuustc0106 mentioned this pull request Jan 19, 2026

[Frontend][Model] Support batch request with refined OmniDiffusionReq… #797

Merged

5 tasks

tzhouam added 4 commits January 20, 2026 02:13

update test import

a296fda

Signed-off-by: tzhouam <[email protected]>

Merge branch 'dev/rebase_0.14.0' of https://github.com/vllm-project/v…

9b39782

…llm-omni into dev/rebase-0.14.0 Signed-off-by: tzhouam <[email protected]>

update version from 0.14.0rc2 to 0.14.0

8ab20a9

Signed-off-by: tzhouam <[email protected]>

set vllm config for all CI

5822ab6

Signed-off-by: tzhouam <[email protected]>

david6666666 added the high priority high priority issue, needs to be done asap label Jan 20, 2026

update CI

70e83e5

Signed-off-by: tzhouam <[email protected]>

gcanlin reviewed Jan 20, 2026

View reviewed changes

ZJY0516 mentioned this pull request Jan 20, 2026

[Diffusion] Non-Intrusive Sequence Parallelism (SP) Model Support Abstraction for vLLM-Omni Framework #779

Merged

8 tasks

update ci

aac988a

Signed-off-by: tzhouam <[email protected]>

ZJY0516 mentioned this pull request Jan 20, 2026

[Core]Add Diffusion executor #865

Merged

5 tasks

gcanlin reviewed Jan 20, 2026

View reviewed changes

gcanlin mentioned this pull request Jan 20, 2026

[Core] Refactor CustomProcessMixin #806

Open

root and others added 11 commits January 20, 2026 11:43

fit CI for V0.14.0

ea46e59

Signed-off-by: root <[email protected]>

Add openai-whisper dependency and set multiprocessing method in tests

8cee7ca

Add async_scheduling option to qwen3_omni_ci.yaml configuration

3fba90e

Update vllm dependency version to 0.14.0 in simple_test.sh

8d3a006

fix pre commit

03a46fb

Signed-off-by: tzhouam <[email protected]>

fix simple unit test

3240917

Signed-off-by: tzhouam <[email protected]>

fix simple unit test

9d89dbc

Signed-off-by: tzhouam <[email protected]>

fix simple unit test

e886494

Signed-off-by: tzhouam <[email protected]>

fix build doc

2c7d558

Signed-off-by: tzhouam <[email protected]>

Update and Qwen 3 config offline

9a53fdc

Signed-off-by: Zhou Taichang <[email protected]>

Add async_scheduling option to qwen3_omni_ci.yaml

d24d9fe

Signed-off-by: Zhou Taichang <[email protected]>

hsliuustc0106 approved these changes Jan 20, 2026

View reviewed changes

hsliuustc0106 merged commit 776c3a7 into main Jan 20, 2026
7 checks passed

david6666666 changed the title ~~Dev/rebase 0.14.0~~ Dev/rebase 0.14.0 and Support GLM-Image Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/rebase 0.14.0 and Support GLM-Image#847

Dev/rebase 0.14.0 and Support GLM-Image#847
hsliuustc0106 merged 59 commits intomainfrom
dev/rebase_0.14.0

tzhouam commented Jan 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 19, 2026

Uh oh!

gcanlin Jan 20, 2026 •

edited

Loading

Uh oh!

tzhouam Jan 20, 2026

Uh oh!

gcanlin Jan 20, 2026 •

edited

Loading

Uh oh!

gcanlin Jan 20, 2026 •

edited

Loading

Uh oh!

hsliuustc0106 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -21,21 +25,19 @@ class GPUGenerationWorker(GPUWorker):
		"""

		def init_device(self):

Conversation

tzhouam commented Jan 19, 2026

Purpose

Test Qwen 2.5 Omni

Test Qwen 3 Omni

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tzhouam Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcanlin Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gcanlin Jan 20, 2026 •

edited

Loading

gcanlin Jan 20, 2026 •

edited

Loading

gcanlin Jan 20, 2026 •

edited

Loading