[Model] Add end2end example and documentation for qwen2.5-omni by Gaohan123 · Pull Request #14 · vllm-project/vllm-omni

Gaohan123 · 2025-10-22T07:31:21Z

Purpose

This PR implements features of Phase 4 in #10 . It adds offline end2end example for model Qwen2.5-omni. Besides, it also provides primary documentation for running the example.

Test Plan

Get into the example folder

cd vllm_omni
cd examples/offline_inference/qwen2_5_omni

Modify PYTHONPATH in run.sh as your path of vllm_omni. Then run.

bash run.sh

Test Result

The output is:

Request ID: 0, Text Output: Well, it usually has input modules for data, processing units like neural networks or algorithms, output for generated audio, and scalability through parallel computing or distributed systems.If you want to know more about any part of this, feel free to ask.

Request ID: 0, Saved audio to output_audio/output_0.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/hsliuustc0106/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

gemini-code-assist · 2025-10-22T07:31:37Z

Summary of Changes

Hello @Gaohan123, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a complete offline end-to-end example for the Qwen2.5-omni model, enabling users to perform single-request inference and generate audio outputs from text prompts. It includes all necessary scripts and utility functions for setting up the environment, processing diverse multimodal inputs, and running the model, along with clear documentation to facilitate usage.

Highlights

New End-to-End Example: An offline end-to-end example has been added for the Qwen2.5-omni model, demonstrating single-request inference capabilities.
Comprehensive Documentation: New documentation is provided to guide users through the installation and execution of the Qwen2.5-omni example.
Multimodal Input Processing: Utility functions for processing multimodal inputs, including smart resizing of images and videos, and handling various media sources (local files, URLs, base64), have been introduced.
Configurable Sampling Parameters: The example script demonstrates how to configure distinct sampling parameters for different stages of the multimodal generation process (thinker, talker, code2wav) to control output characteristics.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an offline end-to-end example for the Qwen2.5-omni model, including documentation and code for running the example. The changes involve adding a README, a Python script (end2end.py), a utility script (utils.py), a processing script (processing_omni.py), and a shell script (run.sh). The review focuses on correctness and maintainability, with an emphasis on code clarity and potential issues in the image and video processing logic.

gemini-code-assist · 2025-10-22T07:32:49Z

examples/offline_inference/qwen_2_5_omni/processing_omni.py

+    if 'video_start' in ele or 'video_end' in ele:
+        raise NotImplementedError(
+            "not support start_pts and end_pts in decord for now.")


It's critical to handle the case where video_start or video_end are provided to decord as it currently raises a NotImplementedError. Either implement the functionality or provide a more informative error message suggesting the user use torchvision if they need to specify start and end points.

Suggested change

if 'video_start' in ele or 'video_end' in ele:

raise NotImplementedError(

"not support start_pts and end_pts in decord for now.")

if 'video_start' in ele or 'video_end' in ele:

logger.error("start_pts and end_pts are not supported in decord. Please use torchvision or remove these arguments.")

raise NotImplementedError("not support start_pts and end_pts in decord for now.")

gemini-code-assist · 2025-10-22T07:32:49Z

examples/offline_inference/qwen_2_5_omni/utils.py

+            resp = requests.get(video_url)
+            assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}"


It's critical to handle potential exceptions when fetching the video. Add a try-except block to catch potential requests.exceptions.RequestException or other network-related errors, and log an appropriate error message. This will prevent the program from crashing if the video cannot be downloaded.

Suggested change

resp = requests.get(video_url)

assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}"

try:

resp = requests.get(video_url)

assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}"

except requests.exceptions.RequestException as e:

logger.error(f"Failed to fetch video from {video_url}: {e}")

raise

gemini-code-assist · 2025-10-22T07:32:49Z

examples/offline_inference/qwen_2_5_omni/processing_omni.py

+    if max(height, width) / min(height, width) > MAX_RATIO:
+        raise ValueError(
+            f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}"
+        )


The aspect ratio check might be too strict. Consider allowing a slightly higher ratio and logging a warning instead of raising an error. This would allow processing of images that are slightly outside the ideal ratio while still preventing extreme cases.

gemini-code-assist · 2025-10-22T07:32:49Z

examples/offline_inference/qwen_2_5_omni/processing_omni.py

+    if not (FRAME_FACTOR <= nframes and nframes <= total_frames):
+        raise ValueError(
+            f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}."
+        )


The condition not (FRAME_FACTOR <= nframes and nframes <= total_frames) can be simplified to nframes < FRAME_FACTOR or nframes > total_frames for better readability.

Suggested change

if not (FRAME_FACTOR <= nframes and nframes <= total_frames):

raise ValueError(

f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}."

)

if nframes < FRAME_FACTOR or nframes > total_frames:

gemini-code-assist · 2025-10-22T07:32:49Z

examples/offline_inference/qwen_2_5_omni/utils.py

+                elif 'video_url' in ele:
+                    ele['type'] = 'video_url'
+                else:
+                    raise ValueError(f'Unknown ele: {ele}')


Consider adding a default value for ele['type'] to prevent ValueError if the type is missing. This can be achieved by using ele.get('type', 'text') or a similar approach.

Suggested change

raise ValueError(f'Unknown ele: {ele}')

ele_type = ele.get('type', 'text') # Default to 'text' if type is missing

if ele_type == 'audio' or ele_type == 'audio_url':

hsliuustc0106 · 2025-10-22T14:38:33Z

examples/offline_inference/qwen_2_5_omni/README.md

+```bash
+bash run.sh
+```
+The output audio is saved in ./output_audio


do we have a test result?

The output is a .wav audio file. Do we need to add it to example folder?

Already update the test plan and result. Fixed.

Gaohan123 · 2025-10-24T02:27:58Z

Fix necessary reviews from #24 , other reviews left for further development.

hsliuustc0106 · 2025-10-24T02:39:47Z

lgtm
approve

[Model] Add end2end example and documentation for qwen2.5-omni

Add end2end example and documentation for qwen2.5-omni

9edce9e

Gaohan123 requested review from fake0fan, hsliuustc0106 and tzhouam October 22, 2025 07:31

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

hsliuustc0106 reviewed Oct 22, 2025

View reviewed changes

Gaohan123 requested a review from hsliuustc0106 October 23, 2025 02:37

Gaohan123 added 2 commits October 23, 2025 14:33

Add example of audio file

1714777

update README

f5c519c

Copilot AI mentioned this pull request Oct 24, 2025

Add comprehensive expert review of PR #14 with security analysis and actionable fixes #24

Closed

hsliuustc0106 merged commit d41d3e4 into vllm-project:main Oct 24, 2025

hsliuustc0106 mentioned this pull request Oct 30, 2025

[Roadmap]: getting ready to v0.11.0rc1 release #33

Closed

18 tasks

Gaohan123 deleted the end2end_example branch November 1, 2025 02:25

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

Merge pull request vllm-project#14 from Gaohan123/end2end_example

d0b0e1d

[Model] Add end2end example and documentation for qwen2.5-omni

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Model] Add end2end example and documentation for qwen2.5-omni#14

[Model] Add end2end example and documentation for qwen2.5-omni#14
hsliuustc0106 merged 3 commits intovllm-project:mainfrom
Gaohan123:end2end_example

Gaohan123 commented Oct 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

hsliuustc0106 Oct 22, 2025

Uh oh!

Gaohan123 Oct 23, 2025

Uh oh!

Gaohan123 Oct 23, 2025

Uh oh!

Gaohan123 commented Oct 24, 2025

Uh oh!

hsliuustc0106 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		resp = requests.get(video_url)
		assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}"

	raise ValueError(f'Unknown ele: {ele}')
	ele_type = ele.get('type', 'text') # Default to 'text' if type is missing
	if ele_type == 'audio' or ele_type == 'audio_url':

Comments

Conversation

Gaohan123 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot commented Oct 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Gaohan123 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Gaohan123 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Gaohan123 commented Oct 24, 2025

Uh oh!

hsliuustc0106 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gaohan123 commented Oct 22, 2025 •

edited

Loading