[Misc] Qwen-Omni support offline inference with local files by SamitHuang · Pull Request #167 · vllm-project/vllm-omni

SamitHuang · 2025-12-02T10:08:31Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolve #161

Test Plan

cd examples/offline_inference/qwen3_omni
# cd examples/offline_inference/qwen2.5_omni

Then run

# Use local video file
python end2end.py --query-type use_video --video-path /path/to/video.mp4

# Use local image file
python end2end.py --query-type use_image --image-path /path/to/image.jpg

# Use local audio file
python end2end.py --query-type use_audio --audio-path /path/to/audio.wav

Test Result

--------------------------------
[Stage-0] Received batch size=1, request_ids=[0]
--------------------------------
[Stage-0] Generate done: batch=1, req_ids=[0], gen_ms=7178.5
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=[0]
--------------------------------
(EngineCore_DP0 pid=1750793) /home/public/yx/vllm-omni/vllm_omni/worker/gpu_model_runner.py:207: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(EngineCore_DP0 pid=1750793)   info_dict[k] = torch.from_numpy(arr)
[Stage-1] Generate done: batch=1, req_ids=[0], gen_ms=12316.4
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=[0]
--------------------------------
(EngineCore_DP0 pid=1751740) INFO:vllm_omni.model_executor.models.qwen2_5_omni.qwen2_5_omni:Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
[Stage-2] Generate done: batch=1, req_ids=[0], gen_ms=4843.1
INFO:vllm_omni.entrypoints.omni_llm:[Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 24620.004653930664, 'e2e_sum_time_ms': 24619.704723358154, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 24619.704723358154, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 24620.004653930664, 'final_stage_id': 2, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 89, 'total_time_ms': 7263.629674911499, 'avg_time_per_request_ms': 7263.629674911499, 'avg_tokens_per_s': 12.252827303050026}, {'stage_id': 1, 'requests': 1, 'tokens': 1082, 'total_time_ms': 12408.075332641602, 'avg_time_per_request_ms': 12408.075332641602, 'avg_tokens_per_s': 87.20127586214848}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 4851.06635093689, 'avg_time_per_request_ms': 4851.06635093689, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 36376783, 'total_time_ms': 47.77932167053223, 'tx_mbps': 6090.799404954347, 'rx_samples': 1, 'rx_total_bytes': 36376783, 'rx_total_time_ms': 70.0995922088623, 'rx_mbps': 4151.44018431549, 'total_samples': 1, 'total_transfer_time_ms': 118.60966682434082, 'total_mbps': 2453.545919077472}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 3265, 'total_time_ms': 0.21910667419433594, 'tx_mbps': 119.21133893362351, 'rx_samples': 1, 'rx_total_bytes': 3265, 'rx_total_time_ms': 0.03528594970703125, 'rx_mbps': 740.2379762162162, 'total_samples': 1, 'total_transfer_time_ms': 0.8287429809570312, 'total_mbps': 31.517612336018413}]}
[rank0]:[W1202 10:13:07.573033067 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1202 10:13:07.603865240 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1202 10:13:07.616500689 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Request ID: 0, Text saved to output_audio/00000.txt
Request ID: 0, Saved audio to output_audio/output_0.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: samithuang <[email protected]>

DarkLight1337 · 2025-12-02T11:05:40Z

cc @Isotr0py @ywang96 as with the other PR, let's try to upstream support for loading from local path

…ject#167) Signed-off-by: samithuang <[email protected]> Signed-off-by: Prajwal A <[email protected]>

…ject#167) Signed-off-by: samithuang <[email protected]> Signed-off-by: Fanli Lin <[email protected]>

…ject#167) Signed-off-by: samithuang <[email protected]>

SamitHuang added 2 commits December 2, 2025 10:02

qwen-omni support offline inference with local files

dffc1bb

Signed-off-by: samithuang <[email protected]>

linting

b434f17

Signed-off-by: samithuang <[email protected]>

SamitHuang requested a review from tzhouam December 2, 2025 10:08

SamitHuang requested a review from hsliuustc0106 as a code owner December 2, 2025 10:08

DarkLight1337 approved these changes Dec 2, 2025

View reviewed changes

DarkLight1337 merged commit 5a0d788 into vllm-project:main Dec 2, 2025
4 checks passed

LawJarp-A pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Dec 12, 2025

[Misc] Qwen-Omni support offline inference with local files (vllm-pro…

fd540e4

…ject#167) Signed-off-by: samithuang <[email protected]> Signed-off-by: Prajwal A <[email protected]>

faaany pushed a commit to faaany/vllm-omni that referenced this pull request Dec 19, 2025

[Misc] Qwen-Omni support offline inference with local files (vllm-pro…

38fcb72

…ject#167) Signed-off-by: samithuang <[email protected]> Signed-off-by: Fanli Lin <[email protected]>

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

[Misc] Qwen-Omni support offline inference with local files (vllm-pro…

b49c50a

…ject#167) Signed-off-by: samithuang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Misc] Qwen-Omni support offline inference with local files#167

[Misc] Qwen-Omni support offline inference with local files#167
DarkLight1337 merged 2 commits intovllm-project:mainfrom
SamitHuang:qwenomni_local_misc

SamitHuang commented Dec 2, 2025 •

edited by DarkLight1337

Loading

Uh oh!

DarkLight1337 commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

SamitHuang commented Dec 2, 2025 • edited by DarkLight1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

DarkLight1337 commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SamitHuang commented Dec 2, 2025 •

edited by DarkLight1337

Loading