vllm-project · hsliuustc0106 · Jan 5, 2026 · Jan 5, 2026 · Jan 5, 2026
diff --git a/docs/.nav.yml b/docs/.nav.yml
@@ -49,7 +49,6 @@ nav:
     - design/architecture_overview.md
     - Feature Design:
       - design/feature/disaggregated_inference.md
-      - design/feature/multi_request_streaming.md
       - design/feature/ray_based_execution.md
     - Module Design:
       - design/module/ar_module.md

diff --git a/docs/design/feature/multi_request_streaming.md b/docs/design/feature/multi_request_streaming.md
diff --git a/docs/design/feature/ray_based_execution.md b/docs/design/feature/ray_based_execution.md
@@ -1,14 +1,17 @@
 # Distributed utils
 
 This directory (vllm_omni/distributed/ray_utils) contains utilities for distributed execution in vllm-omni, supporting both **Ray** and **Multiprocessing** backends.
-
-## 1. Ray Utils
+## 1. Installation
+```bash
+pip install "ray[default]"
+```
+## 2. Ray Utils
 
 The `ray_utils` module provides helper functions for managing Ray clusters and actors, which is essential for:
 *   **Multi-node deployment**: Running pipeline stages across different physical machines.
 *   **Resource management**: Efficient GPU/CPU allocation.
 
-### 1.1 Basic Usage
+### 2.1 Basic Usage
 
 To use the Ray backend, specify `worker_backend="ray"` when initializing the engine.
 
@@ -21,7 +24,7 @@ vllm serve Qwen/Qwen2.5-Omni-7B \
   --ray-address auto
 ```
 
-### 1.2 Cluster Setup
+### 2.2 Cluster Setup
 
 **Step 1: Start Head Node**
 Run this on your primary machine:
@@ -38,24 +41,19 @@ ray start --address=<HEAD_NODE_IP>:6399
 > **Tip**: For a complete cluster setup script, refer to the vLLM example:
 > [run_cluster.sh](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/run_cluster.sh)
 
-### 1.3 Distributed Connector Support
+### 2.3 Distributed Connector Support
 
 When running on Ray, the system automatically adapts its communication strategy:
 
 *   **Cross-Node**: Recommended to use `MooncakeConnector` (requires separate configuration).
 *   **Same-Node**: Can still use `SharedMemoryConnector` for efficiency, or Ray's native object store (plasma).
 *   **SHM threshold default differs**: when `worker_backend="ray"`, the SharedMemoryConnector default threshold is set to `sys.maxsize`, which forces payloads to go inline (no SHM). Override `shm_threshold_bytes` in the connector config if you want SHM for Ray runs.
 
-### 1.4 Internal Helpers
+### 2.4 Internal Helpers
 
 *   **`initialize_ray_cluster`**: Connects to an existing Ray cluster or starts a local one.
 
-## 2. Troubleshooting
+## 3. Troubleshooting
 
 *   **Connection Issues**: Ensure the Ray head node is accessible and ports (default 6399 in this example) are open.
 *   **Version Mismatch**: Ensure all nodes run the same version of Ray and Python.
-
-### Installation
-```bash
-pip install "ray[default]"
-```
diff --git a/docs/design/index.md b/docs/design/index.md
@@ -9,7 +9,6 @@ This section contains design documents and architecture specifications for vLLM-
 ## Feature Design Documents
 
 - [Disaggregated Inference](feature/disaggregated_inference.md)
-- [Multi-Request Streaming](feature/multi_request_streaming.md)
 - [Ray-based Execution](feature/ray_based_execution.md)
 
 ## Module Design Documents