Skip to content

[BugFix] Fix Qwen3 Omni talker mtp torch.compile startup error#1104

Merged
hsliuustc0106 merged 6 commits intovllm-project:mainfrom
ZeldaHuang:fix/talkermtp_torch_compile
Jan 30, 2026
Merged

[BugFix] Fix Qwen3 Omni talker mtp torch.compile startup error#1104
hsliuustc0106 merged 6 commits intovllm-project:mainfrom
ZeldaHuang:fix/talkermtp_torch_compile

Conversation

@ZeldaHuang
Copy link
Contributor

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

ref #1048 and #1102
Talker mtp occurred startup error when use torch compile with small max_batch_size(1 and 2)

  • Changed to dynamically compute position_ids using torch.arange().repeat() to avoid the specialization issue while maintaining correct behavior (credit to @ram16g)
  • Align talker mtp buffer size to max cudagraph capture size (maybe larger than max_num_seqs)

Test Plan

Test Result

Confirmed with small max_batch_size(1、2、3)


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

ram16g and others added 2 commits January 30, 2026 14:40
…predictor

The original code used a pre-computed position_ids_buffer with batch_size-dependent
slicing, which caused torch.compile to specialize batch_size as a constant. This
conflicted with vLLM's @support_torch_compile decorator that marks batch_size as
dynamic, resulting in ConstraintViolationError.

Changed to dynamically compute position_ids using torch.arange().repeat() to avoid
the specialization issue while maintaining correct behavior.

Signed-off-by: ram16g <[email protected]>
@david6666666 david6666666 added this to the v0.14.0 milestone Jan 30, 2026
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 30, 2026
@david6666666 david6666666 linked an issue Jan 30, 2026 that may be closed by this pull request
1 task
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit f6cfc0d into vllm-project:main Jan 30, 2026
7 checks passed
@gcanlin gcanlin mentioned this pull request Jan 30, 2026
5 tasks
dongbo910220 pushed a commit to dongbo910220/vllm-omni that referenced this pull request Feb 1, 2026
…project#1104)

Signed-off-by: ram16g <[email protected]>
Signed-off-by: ZeldaHuang <[email protected]>
Co-authored-by: ram16g <[email protected]>
Co-authored-by: Hongsheng Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: qwen3-omni realtime audio return random voice and noise

4 participants