[Bagel]: Support TP by princepride · Pull Request #1293 · vllm-project/vllm-omni

princepride · 2026-02-09T14:40:15Z

Purpose

#1253 Let Bagel support TP

Test Plan

from PIL import Image
from vllm_omni.entrypoints.omni_diffusion import OmniDiffusion
from vllm_omni.inputs.data import OmniDiffusionSamplingParams, OmniPromptType
import time

def main():
    image = Image.open("women.jpg")
    pipeline = OmniDiffusion(
        model="../models/BAGEL-7B-MoT",
        parallel_config={
            "tensor_parallel_size": 2
        }
    )
    prompts = {
        "prompt": "Let the woman wear a blue dress",
        "multi_modal_data": {"image": image},
    }
    
    result = pipeline.generate(
        prompts,
        OmniDiffusionSamplingParams(
            seed=52
        )
    )
    result[0].images[0].save("bagel_i2i_output.png")

if __name__ == "__main__":
    main()

Test Result

Model output:

Details

`TP = 1` memory usage:

Details

`TP = 2` memory usage:

Details

Signed-off-by: princepride <wangzhipeng628@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 950ce0cc81

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/diffusion/models/bagel/bagel_transformer.py

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-02-09T15:02:21Z

@hsliuustc0106 PTAL

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106 · 2026-02-10T00:35:41Z

update examples as well

Copilot

Pull request overview

Adds tensor-parallel (TP) compatibility for the BAGEL diffusion pipeline by replacing non-TP-aware HF components with vLLM TP layers and updating weight-loading / vocab checks accordingly (addresses #1253).

Changes:

Switch BAGEL’s Qwen2 MoT MLP, embedding, norms, and RoPE to vLLM TP-aware implementations and add TP-aware load_weights on the BAGEL LM module.
Update BAGEL pipeline vocab mismatch checks to use global vocab_size (instead of local embedding shard size under TP).
Make BAGEL pipeline weight filtering TP-aware by allowing shape mismatches for parameters that have a vLLM weight_loader.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`vllm_omni/diffusion/models/bagel/pipeline_bagel.py`	Uses global vocab size for safety checks and relaxes shape checks for TP-sharded parameters during weight loading.
`vllm_omni/diffusion/models/bagel/bagel_transformer.py`	Introduces TP-aware rotary embedding + MLP and swaps core layers to vLLM TP primitives; adds TP-aware LM weight loading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_omni/diffusion/models/bagel/bagel_transformer.py

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-02-10T05:29:23Z

@hsliuustc0106 Ready to merge.

vllm_omni/diffusion/models/bagel/bagel_transformer.py

ZJY0516 · 2026-02-10T07:41:01Z

The gpu mem utilization indicates that some linear layers are not splited.

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-02-10T08:58:56Z

@ZJY0516 Mainly copied from: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/qwen2.py: Qwen2Attention, because bagel used a different rope and add a lot of qkv_moe module so I didn't inherit it, I also update the memory usage of the new version of this model.

wtomin · 2026-02-10T09:04:57Z

examples/online_serving/bagel/README.md

+2. **Launch Server**:
+```bash
+vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/your/custom_bagel.yaml
+```


Is TP online serving supported by CLI argument like --tp 2?

I am afraid not 😂, CLI argument tp can't overwrite the yaml config:

(APIServer pid=1640082) INFO 02-10 01:16:51 [utils.py:261] non-default args: {'model_tag': 'ByteDance-Seed/BAGEL-7B-MoT', 'port': 8091, 'model': 'ByteDance-Seed/BAGEL-7B-MoT', 'tensor_parallel_size': 2} (APIServer pid=1640082) INFO 02-10 01:16:51 [omni.py:117] Initializing stages for model: ByteDance-Seed/BAGEL-7B-MoT (APIServer pid=1640082) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=1640082) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=1640082) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=1640082) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=1640082) INFO 02-10 01:16:51 [initialization.py:197] Auto-configuring SharedMemoryConnector for edge ('0', '1') (APIServer pid=1640082) INFO 02-10 01:16:51 [initialization.py:234] Loaded OmniTransferConfig with 1 connector configurations (APIServer pid=1640082) INFO 02-10 01:16:51 [factory.py:46] Created connector: SharedMemoryConnector (APIServer pid=1640082) INFO 02-10 01:16:51 [initialization.py:60] Created connector for 0 -> 1: SharedMemoryConnector (APIServer pid=1640082) INFO 02-10 01:16:51 [omni_stage.py:239] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'llm', 'runtime': {'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'thinker', 'model_arch': 'BagelForConditionalGeneration', 'worker_type': 'ar', 'scheduler_cls': 'vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler', 'gpu_memory_utilization': 0.35, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'text', 'distributed_executor_backend': 'mp', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'tensor_parallel_size': 1, 'omni_kv_config': {'need_send_cache': True, 'kv_transfer_criteria': {'type': 'prefill_finished'}}, 'max_num_seqs': 1, 'async_chunk': False}, 'final_output': True, 'final_output_type': 'text', 'is_comprehension': True, 'default_sampling_params': {'temperature': 0.4, 'top_p': 0.9, 'top_k': 1, 'max_tokens': 2048, 'seed': 52, 'detokenize': True, 'repetition_penalty': 1.05}} (APIServer pid=1640082) INFO 02-10 01:16:51 [omni_stage.py:239] [OmniStage] stage_config: {'stage_id': 1, 'stage_type': 'diffusion', 'runtime': {'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'dit', 'gpu_memory_utilization': 0.55, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'image', 'distributed_executor_backend': 'mp', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'tensor_parallel_size': 1, 'omni_kv_config': {'need_recv_cache': True}}, 'engine_input_source': [0], 'final_output': True, 'final_output_type': 'image', 'is_comprehension': False, 'default_sampling_params': {'seed': 52}}

@lishunyang12 Can we overwrite it in the future?

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride added 2 commits February 9, 2026 03:23

replace some layers to vllm version

d351c98

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

bbf1242

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride requested a review from hsliuustc0106 as a code owner February 9, 2026 14:40

Merge branch 'main' into change-bagel-mlp-tp

950ce0c

chatgpt-codex-connector bot reviewed Feb 9, 2026

View reviewed changes

vllm_omni/diffusion/models/bagel/bagel_transformer.py Show resolved Hide resolved

fix some bug

ae3cc0f

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

15ab18e

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106 requested a review from Copilot February 10, 2026 00:35

Copilot started reviewing on behalf of hsliuustc0106 February 10, 2026 00:35 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

vllm_omni/diffusion/models/bagel/bagel_transformer.py Show resolved Hide resolved

vllm_omni/diffusion/models/bagel/bagel_transformer.py Show resolved Hide resolved

add docs of bagel tp

4732353

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride added the ready label to trigger buildkite CI label Feb 10, 2026

Gaohan123 added this to the v0.16.0 milestone Feb 10, 2026

princepride enabled auto-merge (squash) February 10, 2026 07:28

princepride disabled auto-merge February 10, 2026 07:30

ZJY0516 reviewed Feb 10, 2026

View reviewed changes

vllm_omni/diffusion/models/bagel/bagel_transformer.py Outdated Show resolved Hide resolved

vllm_omni/diffusion/models/bagel/bagel_transformer.py Show resolved Hide resolved

commit add tp for mlp and attention part

71bb46b

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Merge branch 'main' into change-bagel-mlp-tp

311ec53

wtomin reviewed Feb 10, 2026

View reviewed changes

Merge branch 'main' into change-bagel-mlp-tp

4b85cfd

wtomin mentioned this pull request Feb 10, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

hsliuustc0106 approved these changes Feb 10, 2026

View reviewed changes

hsliuustc0106 merged commit 8228b5a into vllm-project:main Feb 10, 2026
7 checks passed

YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026

[Bagel]: Support TP (vllm-project#1293)

60d94b5

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Comments

Conversation

princepride commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Model output:

TP = 1 memory usage:

TP = 2 memory usage:

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

princepride commented Feb 9, 2026

Uh oh!

hsliuustc0106 commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

princepride commented Feb 10, 2026

Uh oh!

Uh oh!

Uh oh!

ZJY0516 commented Feb 10, 2026

Uh oh!

princepride commented Feb 10, 2026

Uh oh!

wtomin Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

princepride Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

princepride Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

princepride commented Feb 9, 2026 •

edited

Loading

`TP = 1` memory usage:

`TP = 2` memory usage: