[Doc][Bagel] Add BAGEL-7B-MoT documentation and edit the default stage configuration#987
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 144ecab1d8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
54f6501 to
fd93af3
Compare
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive documentation, example scripts, and configuration files for the BAGEL-7B-MoT multimodal model in vLLM-Omni. The PR addresses issue #936 by providing complete deployment guides for both online serving and offline inference modes.
Changes:
- Added single-GPU configuration file (
bagel_single_gpu.yaml) and updated dual-GPU config memory utilization - Created example Python scripts for text-to-image and image-to-text online serving
- Added comprehensive README documentation for both online serving and offline inference examples
- Added user guide documentation and shell scripts for various inference modes
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
vllm_omni/model_executor/stage_configs/bagel_single_gpu.yaml |
New single-GPU configuration with reduced memory utilization (0.40/0.50) |
vllm_omni/model_executor/stage_configs/bagel.yaml |
Increased GPU memory utilization from 0.4 to 0.8 for dual-GPU setup |
examples/online_serving/bagel/t2i.py |
Text-to-image example using OpenAI SDK |
examples/online_serving/bagel/i2t.py |
Image-to-text example with hardcoded path issue |
examples/online_serving/bagel/README.md |
Comprehensive online serving documentation |
examples/offline_inference/bagel/README.md |
Detailed offline inference guide with setup instructions |
examples/offline_inference/bagel/run_t2i.sh |
Shell script for text-to-image inference |
examples/offline_inference/bagel/run_t2t.sh |
Shell script for text-to-text inference |
examples/offline_inference/bagel/run_i2t.sh |
Shell script for image-to-text inference |
examples/offline_inference/bagel/run_t2t_multiple_prompt.sh |
Batch text-to-text inference script |
examples/offline_inference/bagel/text_prompts_10.txt |
Sample text prompts file |
examples/online_serving/bagel/cat.jpg |
Sample image for examples |
docs/user_guide/examples/online_serving/bagel.md |
User guide for online serving |
docs/user_guide/examples/offline_inference/bagel.md |
User guide for offline inference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e332929 to
ab16373
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b779ce2 to
a019b81
Compare
|
PTAL❤️ @princepride |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - If you encounter warnings about flash_attn, try to install lower version like 2.8.1 with the command below. | ||
|
|
||
| ``` | ||
| uv pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.1/flash_attn-2.8.1+cu12torch2.9cxx11abiTRUE-cp312-cp312-linux_x86_64.whl |
There was a problem hiding this comment.
Bagel directly use vLLM flash-attn, I think we don't need install extra flash-attn
72e2539 to
49fe0df
Compare
|
I have deleted the unnecessary content. PTAL again. Thank you very much!❤️ |
princepride
left a comment
There was a problem hiding this comment.
A little need change.
|
Thank you for your advice.❤️ However, both qwen2.5_omni and qwen3_omni are written this way. @princepride |
Okay |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 14 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
49fe0df to
801d59a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The prompt was split into multiple requests because it was not enclosed in quotes. Signed-off-by: Ding Zuhao <[email protected]> Signed-off-by: jzz <[email protected]>
…al online serving and offline inference. Signed-off-by: jzz <[email protected]>
- Add offline_inference and online_serving README files for BAGEL model - Add docs for both offline and online serving examples - Create i2t.py and t2i.py example scripts using OpenAI SDK - Fix broken links with local Windows paths - Fix typos and grammar issues (Staged->Stages, add articles) - Add language identifiers to code blocks (bash, python) - Fix inline comments that would break shell commands Signed-off-by: jzz <[email protected]>
Increased GPU memory utilization from 0.4 to 0.8 for model stages. Signed-off-by: Ding Zuhao <[email protected]> Signed-off-by: jzz <[email protected]>
Signed-off-by: jzz <[email protected]>
Signed-off-by: jzz <[email protected]>
Signed-off-by: jzz <[email protected]>
Signed-off-by: jzz <[email protected]>
Signed-off-by: jzz <[email protected]>
Signed-off-by: jzz <[email protected]>
Signed-off-by: jzz <[email protected]>
Signed-off-by: jzz <[email protected]>
f9db4eb to
6e88ea9
Compare
Head branch was pushed to by a user without write access
b68531f to
cfd81dc
Compare
Signed-off-by: jzz <[email protected]>
9de79f8 to
086bad5
Compare
|
Could you please merge them for me again? I have passed the CI. Thank you very much! @hsliuustc0106 ❤️ |
|
Special thanks to my co-author @princepride ([email protected]) for the significant contribution to this work.❤️ |
…e configuration (vllm-project#987) Signed-off-by: Ding Zuhao <[email protected]> Signed-off-by: jzz <[email protected]>


PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add comprehensive documentation and test example scripts in it for running BAGEL-7B-MoT model in vLLM-Omni.
#936
Test Plan
Tested on dual NVIDIA RTX 5000 Ada GPUs (32GB each) and one NVIDIA A100(80GB).
Container: runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404
Run all the commands in the README
Test Result
All pass
Changes
Documentation
Configuration
devicesof [vllm_omni/model_executor/stage_configs/bagel.yaml]Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)
@princepride PTAL ❤️