Skip to content

[Bugfix] fix precision issues of qwen3-omni when enable async_chunk without system prompt#1288

Merged
Gaohan123 merged 2 commits intovllm-project:mainfrom
R2-Y:fix_precision_problem
Feb 11, 2026
Merged

[Bugfix] fix precision issues of qwen3-omni when enable async_chunk without system prompt#1288
Gaohan123 merged 2 commits intovllm-project:mainfrom
R2-Y:fix_precision_problem

Conversation

@R2-Y
Copy link
Contributor

@R2-Y R2-Y commented Feb 9, 2026

fix precision issues of qwen3-omni when enable async_chunk without system prompt

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

fix precision issues of qwen3-omni when enable async_chunk without system prompt
solve #1278
inject system prompt for qwen omni models if user didn't send request with system prompt

Test Plan

  1. send a request without system prompt
  2. send a request with system prompt

Test Result

  1. send a request without system prompt
image image image the audio is same as text
  1. send a request with system prompt
image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@R2-Y R2-Y requested a review from hsliuustc0106 as a code owner February 9, 2026 11:53
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c77d471ad7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix precision issues in Qwen3-Omni when async_chunk is enabled and requests do not include a system prompt (per vllm-omni issue #1278), by adjusting how decode-time state (e.g., num_processed_tokens) is initialized and tracked.

Changes:

  • Introduces a decode_flag to make a decode-time num_processed_tokens adjustment happen only once.
  • Threads an update_dict through the decode preprocess call path (but currently with an implementation issue that drops state).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +604 to 610
if not info_dict.get("decode_flag", False):
info_dict["num_processed_tokens"] = len(info_dict.get("thinker_input_ids", [])) + 1
update_dict["decode_flag"] = True

last_talker_hidden, text_step, update_dict = self.talker_preprocess_decode(
input_ids, input_embeds, **info_dict
input_ids, input_embeds, update_dict, **info_dict
)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the decode path you set update_dict["decode_flag"] = True to ensure the num_processed_tokens adjustment happens only once, but talker_preprocess_decode immediately reinitializes update_dict and the returned dict overwrites the caller’s update_dict. This drops decode_flag, so the adjustment will repeat on every decode step. Consider having talker_preprocess_decode mutate/extend the passed-in update_dict (or merge the caller’s entries into the returned dict) so decode_flag persists across decode iterations.

Copilot uses AI. Check for mistakes.
Comment on lines +604 to +606
if not info_dict.get("decode_flag", False):
info_dict["num_processed_tokens"] = len(info_dict.get("thinker_input_ids", [])) + 1
update_dict["decode_flag"] = True
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces new per-request decode state (decode_flag) that is specifically meant to fix async_chunk behavior when there is no system prompt, but there doesn’t appear to be a regression test covering the “async_chunk + no system message” case. There are existing Qwen3-Omni E2E tests that always include a system prompt; adding a variant without the system message would help prevent future regressions.

Copilot uses AI. Check for mistakes.
@R2-Y R2-Y force-pushed the fix_precision_problem branch 2 times, most recently from 4771eb7 to 8f79a81 Compare February 9, 2026 12:16
@R2-Y R2-Y changed the title fix precision issues of qwen3-omni when enable async_chunk without sy… [Bugfix] fix precision issues of qwen3-omni when enable async_chunk without sy… Feb 9, 2026
@tzhouam tzhouam added the ready label to trigger buildkite CI label Feb 10, 2026
@tzhouam
Copy link
Collaborator

tzhouam commented Feb 10, 2026

Could you also attach some test results?

Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what is the scenario of non system prompt? It seems that it is unstable without a system prompt for qwen-omni

@Gaohan123 Gaohan123 added this to the v0.16.0 milestone Feb 10, 2026
@R2-Y
Copy link
Contributor Author

R2-Y commented Feb 10, 2026

I wonder what is the scenario of non system prompt? It seems that it is unstable without a system prompt for qwen-omni

user may forget to send request with system prompt

@R2-Y R2-Y force-pushed the fix_precision_problem branch 3 times, most recently from 6b93c07 to 9e3c791 Compare February 10, 2026 07:52
from vllm.assets.audio import AudioAsset
from vllm.utils.argparse_utils import FlexibleArgumentParser

# Modify OpenAI's API key and API base to use vLLM's API server.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need to modify this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because here we hard code port, I think its better to let user to choose their own port, so I added a args for port, default is same as before 8091. I always meet port conflicts problem ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's better to update all openai_chat_completion_client_for_multimodal_generation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe later, not this PR

"content": default_qwen_omni_system_prompt,
}

logger.info("injecting system prompt for Qwen-Omni model")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can show full system prompt here for clearance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
port = getattr(args, "port", 8091)
openai_api_base = f"http://localhost:{port}/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the --host parameter be added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@Gaohan123 Gaohan123 removed the ready label to trigger buildkite CI label Feb 10, 2026
@R2-Y R2-Y force-pushed the fix_precision_problem branch 2 times, most recently from 54c7a07 to ea74c67 Compare February 10, 2026 08:49
…stem_prompt

Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
@R2-Y R2-Y force-pushed the fix_precision_problem branch from ea74c67 to 5e8b334 Compare February 10, 2026 08:49
@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Feb 10, 2026
@R2-Y R2-Y changed the title [Bugfix] fix precision issues of qwen3-omni when enable async_chunk without sy… [Bugfix] fix precision issues of qwen3-omni when enable async_chunk without system prompt Feb 10, 2026
Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@Gaohan123 Gaohan123 merged commit 963b64e into vllm-project:main Feb 11, 2026
7 checks passed
@R2-Y R2-Y deleted the fix_precision_problem branch February 11, 2026 02:05
YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026
…ithout system prompt (vllm-project#1288)

Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants