Skip to content

Comments

[Bagel]: Support TP#1293

Merged
hsliuustc0106 merged 9 commits intovllm-project:mainfrom
princepride:change-bagel-mlp-tp
Feb 10, 2026
Merged

[Bagel]: Support TP#1293
hsliuustc0106 merged 9 commits intovllm-project:mainfrom
princepride:change-bagel-mlp-tp

Conversation

@princepride
Copy link
Collaborator

@princepride princepride commented Feb 9, 2026

Purpose

#1253 Let Bagel support TP

Test Plan

from PIL import Image
from vllm_omni.entrypoints.omni_diffusion import OmniDiffusion
from vllm_omni.inputs.data import OmniDiffusionSamplingParams, OmniPromptType
import time

def main():
    image = Image.open("women.jpg")
    pipeline = OmniDiffusion(
        model="../models/BAGEL-7B-MoT",
        parallel_config={
            "tensor_parallel_size": 2
        }
    )
    prompts = {
        "prompt": "Let the woman wear a blue dress",
        "multi_modal_data": {"image": image},
    }
    
    result = pipeline.generate(
        prompts,
        OmniDiffusionSamplingParams(
            seed=52
        )
    )
    result[0].images[0].save("bagel_i2i_output.png")

if __name__ == "__main__":
    main()

Test Result

Model output:

Details image

TP = 1 memory usage:

Details image

TP = 2 memory usage:

Details image

Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 950ce0cc81

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@princepride
Copy link
Collaborator Author

@hsliuustc0106 PTAL

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@hsliuustc0106
Copy link
Collaborator

update examples as well

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds tensor-parallel (TP) compatibility for the BAGEL diffusion pipeline by replacing non-TP-aware HF components with vLLM TP layers and updating weight-loading / vocab checks accordingly (addresses #1253).

Changes:

  • Switch BAGEL’s Qwen2 MoT MLP, embedding, norms, and RoPE to vLLM TP-aware implementations and add TP-aware load_weights on the BAGEL LM module.
  • Update BAGEL pipeline vocab mismatch checks to use global vocab_size (instead of local embedding shard size under TP).
  • Make BAGEL pipeline weight filtering TP-aware by allowing shape mismatches for parameters that have a vLLM weight_loader.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vllm_omni/diffusion/models/bagel/pipeline_bagel.py Uses global vocab size for safety checks and relaxes shape checks for TP-sharded parameters during weight loading.
vllm_omni/diffusion/models/bagel/bagel_transformer.py Introduces TP-aware rotary embedding + MLP and swaps core layers to vLLM TP primitives; adds TP-aware LM weight loading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@princepride princepride added the ready label to trigger buildkite CI label Feb 10, 2026
@Gaohan123 Gaohan123 added this to the v0.16.0 milestone Feb 10, 2026
@princepride
Copy link
Collaborator Author

@hsliuustc0106 Ready to merge.

@princepride princepride enabled auto-merge (squash) February 10, 2026 07:28
@princepride princepride disabled auto-merge February 10, 2026 07:30
@ZJY0516
Copy link
Collaborator

ZJY0516 commented Feb 10, 2026

The gpu mem utilization indicates that some linear layers are not splited.

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@princepride
Copy link
Collaborator Author

@ZJY0516 Mainly copied from: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/qwen2.py: Qwen2Attention, because bagel used a different rope and add a lot of qkv_moe module so I didn't inherit it, I also update the memory usage of the new version of this model.

2. **Launch Server**:
```bash
vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/your/custom_bagel.yaml
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TP online serving supported by CLI argument like --tp 2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid not 😂, CLI argument tp can't overwrite the yaml config:

(APIServer pid=1640082) INFO 02-10 01:16:51 [utils.py:261] non-default args: {'model_tag': 'ByteDance-Seed/BAGEL-7B-MoT', 'port': 8091, 'model': 'ByteDance-Seed/BAGEL-7B-MoT', 'tensor_parallel_size': 2}
(APIServer pid=1640082) INFO 02-10 01:16:51 [omni.py:117] Initializing stages for model: ByteDance-Seed/BAGEL-7B-MoT
(APIServer pid=1640082) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1640082) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1640082) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1640082) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1640082) INFO 02-10 01:16:51 [initialization.py:197] Auto-configuring SharedMemoryConnector for edge ('0', '1')
(APIServer pid=1640082) INFO 02-10 01:16:51 [initialization.py:234] Loaded OmniTransferConfig with 1 connector configurations
(APIServer pid=1640082) INFO 02-10 01:16:51 [factory.py:46] Created connector: SharedMemoryConnector
(APIServer pid=1640082) INFO 02-10 01:16:51 [initialization.py:60] Created connector for 0 -> 1: SharedMemoryConnector
(APIServer pid=1640082) INFO 02-10 01:16:51 [omni_stage.py:239] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'llm', 'runtime': {'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'thinker', 'model_arch': 'BagelForConditionalGeneration', 'worker_type': 'ar', 'scheduler_cls': 'vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler', 'gpu_memory_utilization': 0.35, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'text', 'distributed_executor_backend': 'mp', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'tensor_parallel_size': 1, 'omni_kv_config': {'need_send_cache': True, 'kv_transfer_criteria': {'type': 'prefill_finished'}}, 'max_num_seqs': 1, 'async_chunk': False}, 'final_output': True, 'final_output_type': 'text', 'is_comprehension': True, 'default_sampling_params': {'temperature': 0.4, 'top_p': 0.9, 'top_k': 1, 'max_tokens': 2048, 'seed': 52, 'detokenize': True, 'repetition_penalty': 1.05}}
(APIServer pid=1640082) INFO 02-10 01:16:51 [omni_stage.py:239] [OmniStage] stage_config: {'stage_id': 1, 'stage_type': 'diffusion', 'runtime': {'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'dit', 'gpu_memory_utilization': 0.55, 'enforce_eager': True, 'trust_remote_code': True, 'engine_output_type': 'image', 'distributed_executor_backend': 'mp', 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'tensor_parallel_size': 1, 'omni_kv_config': {'need_recv_cache': True}}, 'engine_input_source': [0], 'final_output': True, 'final_output_type': 'image', 'is_comprehension': False, 'default_sampling_params': {'seed': 52}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lishunyang12 Can we overwrite it in the future?

@hsliuustc0106 hsliuustc0106 merged commit 8228b5a into vllm-project:main Feb 10, 2026
7 checks passed
YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants