vllm-project · hsliuustc0106 · Jan 13, 2026 · Jan 12, 2026 · Jan 13, 2026
diff --git a/docs/.nav.yml b/docs/.nav.yml
@@ -26,17 +26,16 @@ nav:
   - Configuration:
     - configuration/README.md
     - configuration/*
-  - Diffusion Acceleration:
-    - Overview: user_guide/diffusion_acceleration.md
-    - Acceleration Methods:
-      - TeaCache: user_guide/acceleration/teacache.md
-      - Cache-DiT: user_guide/acceleration/cache_dit_acceleration.md
-      - Parallelism Acceleration: user_guide/acceleration/parallelism_acceleration.md
   - Models:
     - models/supported_models.md
   - Features:
     - Sleep Mode: features/sleep_mode.md
-    - CPU Offloading for Diffusion Model: features/cpu_offload_diffusion.md
+    - Diffusion Features:
+      - Overview: user_guide/diffusion_acceleration.md
+      - TeaCache: user_guide/diffusion/teacache.md
+      - Cache-DiT: user_guide/diffusion/cache_dit_acceleration.md
+      - Parallelism Acceleration: user_guide/diffusion/parallelism_acceleration.md
+      - CPU Offloading: user_guide/diffusion/cpu_offload_diffusion.md
 - Developer Guide:
   - General:
     - contributing/README.md

diff --git a/docs/api/README.md b/docs/api/README.md
@@ -82,6 +82,7 @@ Model execution components.
 - [vllm_omni.model_executor.models.qwen3_omni.qwen3_omni_moe_code_predictor_mtp.Qwen3OmniMoeTalkerCodePredictor][]
 - [vllm_omni.model_executor.models.qwen3_omni.qwen3_omni_moe_talker.Qwen3OmniMoeModel][]
 - [vllm_omni.model_executor.models.qwen3_omni.qwen3_omni_moe_talker.Qwen3OmniMoeTalkerForConditionalGeneration][]
+- [vllm_omni.model_executor.models.qwen3_omni.qwen3_omni_moe_talker.Qwen3OmniMoeTalkerSharedExpertWrapper][]
 - [vllm_omni.model_executor.models.qwen3_omni.qwen3_omni_moe_thinker.Qwen3MoeLLMForCausalLM][]
 - [vllm_omni.model_executor.models.qwen3_omni.qwen3_omni_moe_thinker.Qwen3MoeLLMModel][]
 - [vllm_omni.model_executor.models.qwen3_omni.qwen3_omni_moe_thinker.Qwen3OmniMoeConditionalGenerationMixin][]

diff --git a/docs/configuration/README.md b/docs/configuration/README.md
@@ -16,6 +16,6 @@ For introduction, please check [Introduction for stage config](./stage_configs.m
 
 ## Optimization Features
 
-- **[TeaCache Configuration](../user_guide/acceleration/teacache.md)** - Enable TeaCache adaptive caching for DiT models to achieve 1.5x-2.0x speedup with minimal quality loss
-- **[Cache-DiT Configuration](../user_guide/acceleration/cache_dit_acceleration.md)** - Enable Cache-DiT as cache acceleration backends for DiT models
-- **[Parallelism Configuration](../user_guide/acceleration/parallelism_acceleration.md)** - Enable parallelism (e.g., sequence parallelism) for for DiT models
+- **[TeaCache Configuration](../user_guide/diffusion/teacache.md)** - Enable TeaCache adaptive caching for DiT models to achieve 1.5x-2.0x speedup with minimal quality loss
+- **[Cache-DiT Configuration](../user_guide/diffusion/cache_dit_acceleration.md)** - Enable Cache-DiT as cache acceleration backends for DiT models
+- **[Parallelism Configuration](../user_guide/diffusion/parallelism_acceleration.md)** - Enable parallelism (e.g., sequence parallelism) for for DiT models
diff --git a/...de/acceleration/cache_dit_acceleration.md → ...guide/diffusion/cache_dit_acceleration.md b/...de/acceleration/cache_dit_acceleration.md → ...guide/diffusion/cache_dit_acceleration.md
diff --git a/docs/features/cpu_offload_diffusion.md → ..._guide/diffusion/cpu_offload_diffusion.md b/docs/features/cpu_offload_diffusion.md → ..._guide/diffusion/cpu_offload_diffusion.md
@@ -23,7 +23,7 @@ if __name__ == "__main__":
     m = Omni(model="Qwen/Qwen-Image",enable_cpu_offload=True)
 ```
 
-- **CLI**: pass `--dit-cpu-offload` to the diffusion service entrypoint.
+- **CLI**: pass `--enable-cpu-offload` to the diffusion service entrypoint.
 
 ## Known Limitations
 - Cold start latency increases for over one minute for some models(e.g., Qwen-Image)
diff --git a/.../acceleration/parallelism_acceleration.md → ...ide/diffusion/parallelism_acceleration.md b/.../acceleration/parallelism_acceleration.md → ...ide/diffusion/parallelism_acceleration.md
diff --git a/docs/user_guide/acceleration/teacache.md → docs/user_guide/diffusion/teacache.md b/docs/user_guide/acceleration/teacache.md → docs/user_guide/diffusion/teacache.md
diff --git a/docs/user_guide/diffusion_acceleration.md b/docs/user_guide/diffusion_acceleration.md
@@ -6,8 +6,8 @@ vLLM-Omni supports various cache acceleration methods to speed up diffusion mode
 
 vLLM-Omni currently supports two main cache acceleration backends:
 
-1. **[TeaCache](acceleration/teacache.md)** - Hook-based adaptive caching that caches transformer computations when consecutive timesteps are similar
-2. **[Cache-DiT](acceleration/cache_dit_acceleration.md)** - Library-based acceleration using multiple techniques:
+1. **[TeaCache](diffusion/teacache.md)** - Hook-based adaptive caching that caches transformer computations when consecutive timesteps are similar
+2. **[Cache-DiT](diffusion/cache_dit_acceleration.md)** - Library-based acceleration using multiple techniques:
     - **DBCache** (Dual Block Cache): Caches intermediate transformer block outputs based on residual differences
     - **TaylorSeer**: Uses Taylor expansion-based forecasting for faster inference
     - **SCM** (Step Computation Masking): Selectively computes steps based on adaptive masking
@@ -16,11 +16,11 @@ Both methods can provide significant speedups (typically **1.5x-2.0x**) while ma
 
 vLLM-Omni also supports parallelism methods for diffusion models, including:
 
-1. [Ulysses-SP](acceleration/parallelism_acceleration.md#ulysses-sp) - splits the input along the sequence dimension and uses all-to-all communication to allow each device to compute only a subset of attention heads.
+1. [Ulysses-SP](diffusion/parallelism_acceleration.md#ulysses-sp) - splits the input along the sequence dimension and uses all-to-all communication to allow each device to compute only a subset of attention heads.
 
-2. [Ring-Attention](acceleration/parallelism_acceleration.md#ring-attention) - splits the input along the sequence dimension and uses ring-based P2P communication to accumulate attention results, keeping the sequence dimension sharded.
+2. [Ring-Attention](diffusion/parallelism_acceleration.md#ring-attention) - splits the input along the sequence dimension and uses ring-based P2P communication to accumulate attention results, keeping the sequence dimension sharded.
 
-3. [CFG-Parallel](acceleration/parallelism_acceleration.md#cfg-parallel) - runs the positive/negative prompts of classifier-free guidance (CFG) on different devices, then merges on a single device to perform the scheduler step.
+3. [CFG-Parallel](diffusion/parallelism_acceleration.md#cfg-parallel) - runs the positive/negative prompts of classifier-free guidance (CFG) on different devices, then merges on a single device to perform the scheduler step.
 
 ## Quick Comparison
 
@@ -197,7 +197,7 @@ outputs = omni.generate(prompt="turn this cat to a dog",
 
 For detailed information on each acceleration method:
 
-- **[TeaCache Guide](acceleration/teacache.md)** - Complete TeaCache documentation, configuration options, and best practices
-- **[Cache-DiT Acceleration Guide](acceleration/cache_dit_acceleration.md)** - Comprehensive Cache-DiT guide covering DBCache, TaylorSeer, SCM, and configuration parameters
-- **[Sequence Parallelism](acceleration/parallelism_acceleration.md#sequence-parallelism)** - Guidance on how to set sequence parallelism with configuration.
-- **[CFG-Parallel](acceleration/parallelism_acceleration.md#cfg-parallel)** - Guidance on how to set CFG-Parallel to run positive/negative branches across ranks.
+- **[TeaCache Guide](diffusion/teacache.md)** - Complete TeaCache documentation, configuration options, and best practices
+- **[Cache-DiT Acceleration Guide](diffusion/cache_dit_acceleration.md)** - Comprehensive Cache-DiT guide covering DBCache, TaylorSeer, SCM, and configuration parameters
+- **[Sequence Parallelism](diffusion/parallelism_acceleration.md#sequence-parallelism)** - Guidance on how to set sequence parallelism with configuration.
+- **[CFG-Parallel](diffusion/parallelism_acceleration.md#cfg-parallel)** - Guidance on how to set CFG-Parallel to run positive/negative branches across ranks.