Skip to content

Commit 785a774

Browse files
hsliuustc0106e1ijah1
authored andcommitted
update desgin docs (vllm-project#269)
Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: elijah <f1renze.142857@gmail.com>
1 parent 0dfde9e commit 785a774

6 files changed

Lines changed: 231 additions & 207 deletions

File tree

docs/.nav.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ nav:
3434
- Feature Design:
3535
- design/feature/multi_request_streaming_design.md
3636
- design/feature/omni_connector_design.md
37+
- design/feature/ray_based_execution.md
3738
- Module Design:
3839
- design/module/ar_module_design.md
3940
- design/module/dit_module_design.md

docs/design/architecture_overview.md

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,10 @@
33
This document outlines the architectural design for vLLM-Omni.
44

55
<p align="center">
6-
<img src="../source/architecture/omni-modality-model-architecture.png" alt="Omni-Modality Model Architecture" width="80%">
6+
<picture>
7+
<source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/omni-modality-model-architecture.png">
8+
<img alt="Omni-Modality Model Architecture" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/omni-modality-model-architecture.png" width=55%>
9+
</picture>
710
</p>
811

912
# Goals
@@ -22,26 +25,41 @@ According to analysis for current popular open-source models, most of them have
2225

2326
**DiT as a main structure, with AR as text encoder (e.g.: Qwen-Image)**
2427
A powerful image generation foundation model capable of complex text rendering and precise image editing.
28+
2529
<p align="center">
26-
<img src="../source/architecture/ar-main-architecture.png" alt="Qwen-Image" width="30%">
30+
<picture>
31+
<source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/ar-main-architecture.png">
32+
<img alt="Qwen-Image" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/ar-main-architecture.png" width=30%>
33+
</picture>
2734
</p>
2835

2936
**AR as a main structure, with DiT as multi-modal generator (e.g. BAGEL)**
3037
A unified multimodal comprehension and generation model, with cot text output and visual generation.
38+
3139
<p align="center">
32-
<img src="../source/architecture/dit-main-architecture.png" alt="Bagel" width="30%">
40+
<picture>
41+
<source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/dit-main-architecture.png">
42+
<img alt="Bagel" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/dit-main-architecture.png" width=30%>
43+
</picture>
3344
</p>
3445

3546
**AR+DiT (e.g. Qwen-Omni)**
3647
A natively end-to-end omni-modal LLM for multimodal inputs (text/image/audio/video...) and outputs (text/audio...).
48+
3749
<p align="center">
38-
<img src="../source/architecture/ar-dit-main-architecture.png" alt="Qwen-Omni" width="30%">
50+
<picture>
51+
<source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/ar-dit-main-architecture.png">
52+
<img alt="Qwen-Omni" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/ar-dit-main-architecture.png" width=30%>
53+
</picture>
3954
</p>
4055

4156
# vLLM-Omni main architecture
4257

4358
<p align="center">
44-
<img src="../source/architecture/vllm-omni-main-architecture.png" alt="vLLM-Omni Main Architecture" width="80%">
59+
<picture>
60+
<source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/vllm-omni-main-architecture.png">
61+
<img alt="vLLM-Omni Main Architecture" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/vllm-omni-main-architecture.png" width=55%>
62+
</picture>
4563
</p>
4664

4765
## Key Components
@@ -89,7 +107,12 @@ vLLM-Omni is designed to be flexible and straightforward for users:
89107

90108
If you use vLLM, then you know how to use vLLM-Omni from Day 0:
91109

92-
![vLLM-Omni interface design](../source/architecture/vllm-omni-user-interface.png)
110+
<p align="center">
111+
<picture>
112+
<source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/vllm-omni-user-interface.png">
113+
<img alt="vLLM-Omni interface design" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/vllm-omni-user-interface.png" width=55%>
114+
</picture>
115+
</p>
93116

94117
Taking **Qwen3-Omni** as an example:
95118

docs/design/connectors/omni_connector_design.md

Lines changed: 0 additions & 200 deletions
This file was deleted.

0 commit comments

Comments
 (0)