[Bugfix] fix precision issues of qwen3-omni when enable async_chunk without system prompt by R2-Y · Pull Request #1288 · vllm-project/vllm-omni

R2-Y · 2026-02-09T11:53:43Z

fix precision issues of qwen3-omni when enable async_chunk without system prompt

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

fix precision issues of qwen3-omni when enable async_chunk without system prompt
solve #1278
inject system prompt for qwen omni models if user didn't send request with system prompt

Test Plan

send a request without system prompt
send a request with system prompt

Test Result

send a request without system prompt

the audio is same as text

send a request with system prompt

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c77d471ad7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py

Copilot

Pull request overview

This PR aims to fix precision issues in Qwen3-Omni when async_chunk is enabled and requests do not include a system prompt (per vllm-omni issue #1278), by adjusting how decode-time state (e.g., num_processed_tokens) is initialized and tracked.

Changes:

Introduces a decode_flag to make a decode-time num_processed_tokens adjustment happen only once.
Threads an update_dict through the decode preprocess call path (but currently with an implementation issue that drops state).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-09T12:01:01Z

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py

+            if not info_dict.get("decode_flag", False):
                info_dict["num_processed_tokens"] = len(info_dict.get("thinker_input_ids", [])) + 1
+                update_dict["decode_flag"] = True

            last_talker_hidden, text_step, update_dict = self.talker_preprocess_decode(
-                input_ids, input_embeds, **info_dict
+                input_ids, input_embeds, update_dict, **info_dict
            )


In the decode path you set update_dict["decode_flag"] = True to ensure the num_processed_tokens adjustment happens only once, but talker_preprocess_decode immediately reinitializes update_dict and the returned dict overwrites the caller’s update_dict. This drops decode_flag, so the adjustment will repeat on every decode step. Consider having talker_preprocess_decode mutate/extend the passed-in update_dict (or merge the caller’s entries into the returned dict) so decode_flag persists across decode iterations.

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py

Copilot · 2026-02-09T12:01:01Z

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py

+            if not info_dict.get("decode_flag", False):
                info_dict["num_processed_tokens"] = len(info_dict.get("thinker_input_ids", [])) + 1
+                update_dict["decode_flag"] = True


This change introduces new per-request decode state (decode_flag) that is specifically meant to fix async_chunk behavior when there is no system prompt, but there doesn’t appear to be a regression test covering the “async_chunk + no system message” case. There are existing Qwen3-Omni E2E tests that always include a system prompt; adding a variant without the system message would help prevent future regressions.

tzhouam · 2026-02-10T03:12:25Z

Could you also attach some test results?

Gaohan123

I wonder what is the scenario of non system prompt? It seems that it is unstable without a system prompt for qwen-omni

R2-Y · 2026-02-10T07:04:43Z

I wonder what is the scenario of non system prompt? It seems that it is unstable without a system prompt for qwen-omni

user may forget to send request with system prompt

Gaohan123 · 2026-02-10T07:55:34Z

examples/online_serving/qwen3_omni/openai_chat_completion_client_for_multimodal_generation.py

 from vllm.assets.audio import AudioAsset
 from vllm.utils.argparse_utils import FlexibleArgumentParser

-# Modify OpenAI's API key and API base to use vLLM's API server.


why you need to modify this?

because here we hard code port, I think its better to let user to choose their own port, so I added a args for port, default is same as before 8091. I always meet port conflicts problem ...

maybe it's better to update all openai_chat_completion_client_for_multimodal_generation?

Maybe later, not this PR

Gaohan123 · 2026-02-10T07:55:56Z

vllm_omni/entrypoints/chat_utils.py

+        "content": default_qwen_omni_system_prompt,
+    }
+
+    logger.info("injecting system prompt for Qwen-Omni model")


I think you can show full system prompt here for clearance

amy-why-3459 · 2026-02-10T08:00:36Z

examples/online_serving/qwen3_omni/openai_chat_completion_client_for_multimodal_generation.py

+    # Modify OpenAI's API key and API base to use vLLM's API server.
+    openai_api_key = "EMPTY"
+    port = getattr(args, "port", 8091)
+    openai_api_base = f"http://localhost:{port}/v1"


Can the --host parameter be added?

…stem_prompt Signed-off-by: Rein Yang <ruiruyang2@gmail.com>

Gaohan123

LGTM. Thanks!

…ithout system prompt (vllm-project#1288) Signed-off-by: Rein Yang <ruiruyang2@gmail.com>

R2-Y requested a review from hsliuustc0106 as a code owner February 9, 2026 11:53

hsliuustc0106 reviewed Feb 9, 2026

View reviewed changes

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Feb 9, 2026

View reviewed changes

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py Outdated Show resolved Hide resolved

hsliuustc0106 requested review from Copilot and tzhouam February 9, 2026 11:57

Copilot started reviewing on behalf of hsliuustc0106 February 9, 2026 11:57 View session

R2-Y force-pushed the fix_precision_problem branch from c77d471 to 35feba7 Compare February 9, 2026 11:58

Copilot AI reviewed Feb 9, 2026

View reviewed changes

R2-Y force-pushed the fix_precision_problem branch 2 times, most recently from 4771eb7 to 8f79a81 Compare February 9, 2026 12:16

R2-Y changed the title ~~fix precision issues of qwen3-omni when enable async_chunk without sy…~~ [Bugfix] fix precision issues of qwen3-omni when enable async_chunk without sy… Feb 9, 2026

tzhouam added the ready label to trigger buildkite CI label Feb 10, 2026

Gaohan123 reviewed Feb 10, 2026

View reviewed changes

Gaohan123 added this to the v0.16.0 milestone Feb 10, 2026

R2-Y force-pushed the fix_precision_problem branch 3 times, most recently from 6b93c07 to 9e3c791 Compare February 10, 2026 07:52

Gaohan123 reviewed Feb 10, 2026

View reviewed changes

amy-why-3459 reviewed Feb 10, 2026

View reviewed changes

Gaohan123 removed the ready label to trigger buildkite CI label Feb 10, 2026

R2-Y force-pushed the fix_precision_problem branch 2 times, most recently from 54c7a07 to ea74c67 Compare February 10, 2026 08:49

fix precision issues of qwen3-omni when enable async_chunk without sy…

5e8b334

…stem_prompt Signed-off-by: Rein Yang <ruiruyang2@gmail.com>

R2-Y force-pushed the fix_precision_problem branch from ea74c67 to 5e8b334 Compare February 10, 2026 08:49

Gaohan123 added the ready label to trigger buildkite CI label Feb 10, 2026

R2-Y changed the title ~~[Bugfix] fix precision issues of qwen3-omni when enable async_chunk without sy…~~ [Bugfix] fix precision issues of qwen3-omni when enable async_chunk without system prompt Feb 10, 2026

Merge branch 'main' into fix_precision_problem

6c1ea13

R2-Y mentioned this pull request Feb 10, 2026

[Bug]: async_chunk, without system_prompt precision issues JiusiServe/vllm-omni#116

Closed

1 task

Gaohan123 approved these changes Feb 11, 2026

View reviewed changes

Gaohan123 merged commit 963b64e into vllm-project:main Feb 11, 2026
7 checks passed

R2-Y deleted the fix_precision_problem branch February 11, 2026 02:05

YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026

[Bugfix] fix precision issues of qwen3-omni when enable async_chunk w…

4f00e2e

…ithout system prompt (vllm-project#1288) Signed-off-by: Rein Yang <ruiruyang2@gmail.com>

Conversation

R2-Y commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

tzhouam commented Feb 10, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

R2-Y commented Feb 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

R2-Y commented Feb 9, 2026 •

edited

Loading