vllm-project · hsliuustc0106 · Feb 10, 2026 · Feb 9, 2026 · Feb 9, 2026 · Feb 9, 2026
@@ -151,6 +151,24 @@ The default yaml configuration deploys Thinker and DiT on the same GPU. You can
 
 ------
 
+#### Tensor Parallelism (TP)
+
+For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) by modifying the stage configuration (e.g., [`bagel.yaml`](../../../vllm_omni/model_executor/stage_configs/bagel.yaml)).
+
+1. **Set `tensor_parallel_size`**: Increase this value (e.g., to `2` or `4`).
+2. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the stage (e.g., `"0,1"`).
+
+Example configuration for TP=2 on GPUs 0 and 1:
+```yaml
+    engine_args:
+      tensor_parallel_size: 2
+      ...
+    runtime:
+      devices: "0,1"
+```
+
+------
+
 #### 🔗 Runtime Configuration
 
 | Parameter             | Value   | Description                      |

@@ -22,12 +22,29 @@ cd /workspace/vllm-omni/examples/online_serving/bagel
 bash run_server.sh
 ```
 
-If you have a custom stage configs file, launch the server with the command below:
-
 ```bash
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
 ```
 
+#### 🚀 Tensor Parallelism (TP)
+
+For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) for the server.
+
+1. **Modify Stage Config**: Create or modify a stage configuration yaml (e.g., [`bagel.yaml`](../../../vllm_omni/model_executor/stage_configs/bagel.yaml)). Set `tensor_parallel_size` to `2` (or more) and update `devices` to include multiple GPU IDs (e.g., `"0,1"`).
+
+```yaml
+    engine_args:
+      tensor_parallel_size: 2
+      ...
+    runtime:
+      devices: "0,1"
+```
+
+2. **Launch Server**:
+```bash
+vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/your/custom_bagel.yaml
+```
+
 ### Send Multi-modal Request
 
 Get into the bagel folder: