[Model] Add end2end example and documentation for qwen2.5-omni#14
[Model] Add end2end example and documentation for qwen2.5-omni#14hsliuustc0106 merged 3 commits intovllm-project:mainfrom
Conversation
Summary of ChangesHello @Gaohan123, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a complete offline end-to-end example for the Qwen2.5-omni model, enabling users to perform single-request inference and generate audio outputs from text prompts. It includes all necessary scripts and utility functions for setting up the environment, processing diverse multimodal inputs, and running the model, along with clear documentation to facilitate usage. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces an offline end-to-end example for the Qwen2.5-omni model, including documentation and code for running the example. The changes involve adding a README, a Python script (end2end.py), a utility script (utils.py), a processing script (processing_omni.py), and a shell script (run.sh). The review focuses on correctness and maintainability, with an emphasis on code clarity and potential issues in the image and video processing logic.
| if 'video_start' in ele or 'video_end' in ele: | ||
| raise NotImplementedError( | ||
| "not support start_pts and end_pts in decord for now.") |
There was a problem hiding this comment.
It's critical to handle the case where video_start or video_end are provided to decord as it currently raises a NotImplementedError. Either implement the functionality or provide a more informative error message suggesting the user use torchvision if they need to specify start and end points.
| if 'video_start' in ele or 'video_end' in ele: | |
| raise NotImplementedError( | |
| "not support start_pts and end_pts in decord for now.") | |
| if 'video_start' in ele or 'video_end' in ele: | |
| logger.error("start_pts and end_pts are not supported in decord. Please use torchvision or remove these arguments.") | |
| raise NotImplementedError("not support start_pts and end_pts in decord for now.") |
| resp = requests.get(video_url) | ||
| assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}" |
There was a problem hiding this comment.
It's critical to handle potential exceptions when fetching the video. Add a try-except block to catch potential requests.exceptions.RequestException or other network-related errors, and log an appropriate error message. This will prevent the program from crashing if the video cannot be downloaded.
| resp = requests.get(video_url) | |
| assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}" | |
| try: | |
| resp = requests.get(video_url) | |
| assert resp.status_code == requests.codes.ok, f"Failed to fetch video from {video_url}, status_code:{resp.status_code}, resp:{resp}" | |
| except requests.exceptions.RequestException as e: | |
| logger.error(f"Failed to fetch video from {video_url}: {e}") | |
| raise |
| if max(height, width) / min(height, width) > MAX_RATIO: | ||
| raise ValueError( | ||
| f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}" | ||
| ) |
| if not (FRAME_FACTOR <= nframes and nframes <= total_frames): | ||
| raise ValueError( | ||
| f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}." | ||
| ) |
There was a problem hiding this comment.
The condition not (FRAME_FACTOR <= nframes and nframes <= total_frames) can be simplified to nframes < FRAME_FACTOR or nframes > total_frames for better readability.
| if not (FRAME_FACTOR <= nframes and nframes <= total_frames): | |
| raise ValueError( | |
| f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}." | |
| ) | |
| if nframes < FRAME_FACTOR or nframes > total_frames: |
| elif 'video_url' in ele: | ||
| ele['type'] = 'video_url' | ||
| else: | ||
| raise ValueError(f'Unknown ele: {ele}') |
There was a problem hiding this comment.
Consider adding a default value for ele['type'] to prevent ValueError if the type is missing. This can be achieved by using ele.get('type', 'text') or a similar approach.
| raise ValueError(f'Unknown ele: {ele}') | |
| ele_type = ele.get('type', 'text') # Default to 'text' if type is missing | |
| if ele_type == 'audio' or ele_type == 'audio_url': |
| ```bash | ||
| bash run.sh | ||
| ``` | ||
| The output audio is saved in ./output_audio |
There was a problem hiding this comment.
do we have a test result?
There was a problem hiding this comment.
The output is a .wav audio file. Do we need to add it to example folder?
There was a problem hiding this comment.
Already update the test plan and result. Fixed.
|
Fix necessary reviews from #24 , other reviews left for further development. |
|
lgtm |
[Model] Add end2end example and documentation for qwen2.5-omni
Purpose
This PR implements features of Phase 4 in #10 . It adds offline end2end example for model Qwen2.5-omni. Besides, it also provides primary documentation for running the example.
Test Plan
Get into the example folder
Modify PYTHONPATH in run.sh as your path of vllm_omni. Then run.
Test Result
The output is:
Request ID: 0, Text Output: Well, it usually has input modules for data, processing units like neural networks or algorithms, output for generated audio, and scalability through parallel computing or distributed systems.If you want to know more about any part of this, feel free to ask.
Request ID: 0, Saved audio to output_audio/output_0.wav
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/hsliuustc0106/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)