[NPU][Model] Support Qwen3-Omni on NPU by gcanlin · Pull Request #484 · vllm-project/vllm-omni

gcanlin · 2025-12-26T06:46:02Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Following #266, support Qwen3-Omni on NPU.

Test Plan

python end2end.py --output-wav output_audio \
                   --query-type text

Test Result

INFO 12-26 06:40:55 [log_utils.py:529] {'type': 'request_level_metrics',
INFO 12-26 06:40:55 [log_utils.py:529]  'request_id': '0_58ef1eab-5f96-4eaf-88cf-bf5d0fa306f8',
INFO 12-26 06:40:55 [log_utils.py:529]  'e2e_time_ms': 38717.89574623108,
INFO 12-26 06:40:55 [log_utils.py:529]  'e2e_tpt': 0.0,
INFO 12-26 06:40:55 [log_utils.py:529]  'num_tokens_out': 0,
INFO 12-26 06:40:55 [log_utils.py:529]  'transfers_total_time_ms': 9.086847305297852,
INFO 12-26 06:40:55 [log_utils.py:529]  'transfers_total_bytes': 2194264,
INFO 12-26 06:40:55 [log_utils.py:529]  'stages': {0: {'stage_gen_time_ms': 2264.1279697418213, 'num_tokens_out': 27},
INFO 12-26 06:40:55 [log_utils.py:529]             1: {'stage_gen_time_ms': 13507.445573806763, 'num_tokens_out': 110},
INFO 12-26 06:40:55 [log_utils.py:529]             2: {'stage_gen_time_ms': 21961.585521697998, 'num_tokens_out': 0}}}
INFO 12-26 06:40:55 [omni_llm.py:476] [Summary] {'e2e_requests': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]  'e2e_total_time_ms': 38718.159437179565,
INFO 12-26 06:40:55 [omni_llm.py:476]  'e2e_sum_time_ms': 38717.89574623108,
INFO 12-26 06:40:55 [omni_llm.py:476]  'e2e_total_tokens': 0,
INFO 12-26 06:40:55 [omni_llm.py:476]  'e2e_avg_time_per_request_ms': 38717.89574623108,
INFO 12-26 06:40:55 [omni_llm.py:476]  'e2e_avg_tokens_per_s': 0.0,
INFO 12-26 06:40:55 [omni_llm.py:476]  'wall_time_ms': 38718.159437179565,
INFO 12-26 06:40:55 [omni_llm.py:476]  'final_stage_id': {'0_58ef1eab-5f96-4eaf-88cf-bf5d0fa306f8': 2},
INFO 12-26 06:40:55 [omni_llm.py:476]  'stages': [{'stage_id': 0,
INFO 12-26 06:40:55 [omni_llm.py:476]              'requests': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]              'tokens': 27,
INFO 12-26 06:40:55 [omni_llm.py:476]              'total_time_ms': 2277.573823928833,
INFO 12-26 06:40:55 [omni_llm.py:476]              'avg_time_per_request_ms': 2277.573823928833,
INFO 12-26 06:40:55 [omni_llm.py:476]              'avg_tokens_per_s': 11.854720016681956},
INFO 12-26 06:40:55 [omni_llm.py:476]             {'stage_id': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]              'requests': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]              'tokens': 110,
INFO 12-26 06:40:55 [omni_llm.py:476]              'total_time_ms': 13517.81678199768,
INFO 12-26 06:40:55 [omni_llm.py:476]              'avg_time_per_request_ms': 13517.81678199768,
INFO 12-26 06:40:55 [omni_llm.py:476]              'avg_tokens_per_s': 8.137408708371623},
INFO 12-26 06:40:55 [omni_llm.py:476]             {'stage_id': 2,
INFO 12-26 06:40:55 [omni_llm.py:476]              'requests': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]              'tokens': 0,
INFO 12-26 06:40:55 [omni_llm.py:476]              'total_time_ms': 21966.142654418945,
INFO 12-26 06:40:55 [omni_llm.py:476]              'avg_time_per_request_ms': 21966.142654418945,
INFO 12-26 06:40:55 [omni_llm.py:476]              'avg_tokens_per_s': 0.0}],
INFO 12-26 06:40:55 [omni_llm.py:476]  'transfers': [{'from_stage': 0,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'to_stage': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'samples': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_bytes': 2188830,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_time_ms': 3.2148361206054688,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'tx_mbps': 5446.8219663719965,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'rx_samples': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'rx_total_bytes': 2188830,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'rx_total_time_ms': 3.0269622802734375,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'rx_mbps': 5784.888736181474,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_samples': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_transfer_time_ms': 6.963253021240234,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_mbps': 2514.7212009367936},
INFO 12-26 06:40:55 [omni_llm.py:476]                {'from_stage': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'to_stage': 2,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'samples': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_bytes': 5434,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_time_ms': 0.2963542938232422,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'tx_mbps': 146.68928679646018,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'rx_samples': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'rx_total_bytes': 5434,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'rx_total_time_ms': 1.2280941009521484,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'rx_mbps': 35.397938941564746,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_samples': 1,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_transfer_time_ms': 2.123594284057617,
INFO 12-26 06:40:55 [omni_llm.py:476]                 'total_mbps': 20.470953574491972}]}

text:

Prompt:
<|im_start|>system
You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.<|im_end|>
<|im_start|>user
Explain the system architecture for a scalable audio generation pipeline. Answer in 15 words.<|im_end|>
<|im_start|>assistant

vllm_text_output:
Modular microservices, distributed processing, load balancing, cloud scalability, real-time streaming, and GPU-accelerated inference.

audio(use ffmpeg to convert audio to mp4 format for showing):

ffmpeg -i output_audio/output_0_58ef1eab-5f96-4eaf-88cf-bf5d0fa306f8.wav -c:a aac -b:a 192k output.mp4

output-qwen3-omni.mp4

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/model_executor/stage_configs/npu/qwen3_omni_moe.yaml

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2025-12-26T10:46:37Z

@hsliuustc0106 Could we merge it now? We want to ensure Qwen3-Omni basic functionality on NPU in the main at least. Thanks for reviewing again!

hsliuustc0106

lgtm

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

[NPU][Model] Support Qwen3-Omni on NPU

0478ebd

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin requested a review from hsliuustc0106 as a code owner December 26, 2025 06:46

chatgpt-codex-connector bot reviewed Dec 26, 2025

View reviewed changes

vllm_omni/model_executor/stage_configs/npu/qwen3_omni_moe.yaml Show resolved Hide resolved

fix pre-commit lint

734499c

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

hsliuustc0106 added the ready label to trigger buildkite CI label Dec 26, 2025

Merge branch 'main' into qwen3-omni-new

6729198

hsliuustc0106 approved these changes Dec 26, 2025

View reviewed changes

hsliuustc0106 merged commit 52d15c4 into vllm-project:main Dec 26, 2025
7 checks passed

david6666666 mentioned this pull request Dec 29, 2025

[Doc]: Qwen3 omni running on Ascend #343

Closed

1 task

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

[NPU][Model] Support Qwen3-Omni on NPU (vllm-project#484)

40ec179

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU][Model] Support Qwen3-Omni on NPU#484

[NPU][Model] Support Qwen3-Omni on NPU#484
hsliuustc0106 merged 3 commits intovllm-project:mainfrom
gcanlin:qwen3-omni-new

gcanlin commented Dec 26, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

gcanlin commented Dec 26, 2025

Uh oh!

hsliuustc0106 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gcanlin commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gcanlin commented Dec 26, 2025

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcanlin commented Dec 26, 2025 •

edited

Loading