Skip to content

Comments

[Model] Add end2end example and documentation for qwen2.5-omni#14

Merged
hsliuustc0106 merged 3 commits intovllm-project:mainfrom
Gaohan123:end2end_example
Oct 24, 2025
Merged

[Model] Add end2end example and documentation for qwen2.5-omni#14
hsliuustc0106 merged 3 commits intovllm-project:mainfrom
Gaohan123:end2end_example

Conversation

@Gaohan123
Copy link
Collaborator

@Gaohan123 Gaohan123 commented Oct 22, 2025

Purpose

This PR implements features of Phase 4 in #10 . It adds offline end2end example for model Qwen2.5-omni. Besides, it also provides primary documentation for running the example.

Test Plan

Get into the example folder

cd vllm_omni
cd examples/offline_inference/qwen2_5_omni

Modify PYTHONPATH in run.sh as your path of vllm_omni. Then run.

bash run.sh

Test Result

The output is:

Request ID: 0, Text Output: Well, it usually has input modules for data, processing units like neural networks or algorithms, output for generated audio, and scalability through parallel computing or distributed systems.If you want to know more about any part of this, feel free to ask.

Request ID: 0, Saved audio to output_audio/output_0.wav


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/hsliuustc0106/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@gemini-code-assist
Copy link

Summary of Changes

Hello @Gaohan123, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a complete offline end-to-end example for the Qwen2.5-omni model, enabling users to perform single-request inference and generate audio outputs from text prompts. It includes all necessary scripts and utility functions for setting up the environment, processing diverse multimodal inputs, and running the model, along with clear documentation to facilitate usage.

Highlights

  • New End-to-End Example: An offline end-to-end example has been added for the Qwen2.5-omni model, demonstrating single-request inference capabilities.
  • Comprehensive Documentation: New documentation is provided to guide users through the installation and execution of the Qwen2.5-omni example.
  • Multimodal Input Processing: Utility functions for processing multimodal inputs, including smart resizing of images and videos, and handling various media sources (local files, URLs, base64), have been introduced.
  • Configurable Sampling Parameters: The example script demonstrates how to configure distinct sampling parameters for different stages of the multimodal generation process (thinker, talker, code2wav) to control output characteristics.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an offline end-to-end example for the Qwen2.5-omni model, including documentation and code for running the example. The changes involve adding a README, a Python script (end2end.py), a utility script (utils.py), a processing script (processing_omni.py), and a shell script (run.sh). The review focuses on correctness and maintainability, with an emphasis on code clarity and potential issues in the image and video processing logic.

Comment on lines +242 to +244
if 'video_start' in ele or 'video_end' in ele:
raise NotImplementedError(
"not support start_pts and end_pts in decord for now.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

It's critical to handle the case where video_start or video_end are provided to decord as it currently raises a NotImplementedError. Either implement the functionality or provide a more informative error message suggesting the user use torchvision if they need to specify start and end points.

Suggested change
if 'video_start' in ele or 'video_end' in ele:
raise NotImplementedError(
"not support start_pts and end_pts in decord for now.")
if 'video_start' in ele or 'video_end' in ele:
logger.error("start_pts and end_pts are not supported in decord. Please use torchvision or remove these arguments.")
raise NotImplementedError("not support start_pts and end_pts in decord for now.")

Comment on lines +91 to +92
resp = requests.get(video_url)
assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

It's critical to handle potential exceptions when fetching the video. Add a try-except block to catch potential requests.exceptions.RequestException or other network-related errors, and log an appropriate error message. This will prevent the program from crashing if the video cannot be downloaded.

Suggested change
resp = requests.get(video_url)
assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}"
try:
resp = requests.get(video_url)
assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}"
except requests.exceptions.RequestException as e:
logger.error(f"Failed to fetch video from {video_url}: {e}")
raise

Comment on lines +69 to +72
if max(height, width) / min(height, width) > MAX_RATIO:
raise ValueError(
f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The aspect ratio check might be too strict. Consider allowing a slightly higher ratio and logging a warning instead of raising an error. This would allow processing of images that are slightly outside the ideal ratio while still preventing extreme cases.

Comment on lines +171 to +174
if not (FRAME_FACTOR <= nframes and nframes <= total_frames):
raise ValueError(
f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}."
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition not (FRAME_FACTOR <= nframes and nframes <= total_frames) can be simplified to nframes < FRAME_FACTOR or nframes > total_frames for better readability.

Suggested change
if not (FRAME_FACTOR <= nframes and nframes <= total_frames):
raise ValueError(
f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}."
)
if nframes < FRAME_FACTOR or nframes > total_frames:

elif 'video_url' in ele:
ele['type'] = 'video_url'
else:
raise ValueError(f'Unknown ele: {ele}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a default value for ele['type'] to prevent ValueError if the type is missing. This can be achieved by using ele.get('type', 'text') or a similar approach.

Suggested change
raise ValueError(f'Unknown ele: {ele}')
ele_type = ele.get('type', 'text') # Default to 'text' if type is missing
if ele_type == 'audio' or ele_type == 'audio_url':

```bash
bash run.sh
```
The output audio is saved in ./output_audio
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is a .wav audio file. Do we need to add it to example folder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already update the test plan and result. Fixed.

@Gaohan123
Copy link
Collaborator Author

Fix necessary reviews from #24 , other reviews left for further development.

@hsliuustc0106
Copy link
Collaborator

lgtm
approve

@hsliuustc0106 hsliuustc0106 merged commit d41d3e4 into vllm-project:main Oct 24, 2025
@Gaohan123 Gaohan123 deleted the end2end_example branch November 1, 2025 02:25
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
[Model] Add end2end example and documentation for qwen2.5-omni
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants