You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/), [Large-scale expert parallelism](https://lmsys.org/blog/2025-05-05-large-scale-ep/).
Copy file name to clipboardExpand all lines: docs/supported_models/support_new_models.md
+20-8Lines changed: 20 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,8 +21,8 @@ standard LLM support:
21
21
in [model_config.py](https://github.com/sgl-project/sglang/blob/0ab3f437aba729b348a683ab32b35b214456efc7/python/sglang/srt/configs/model_config.py#L561)
22
22
to return `True` for your model.
23
23
24
-
2.**Register a new chat-template**
25
-
See [conversation.py](https://github.com/sgl-project/sglang/blob/86a779dbe9e815c02f71ea82574608f6eae016b5/python/sglang/srt/conversation.py)
24
+
2.**Register a new chat-template**:
25
+
Only when your default chat-template is unable to accept images as input: Register a new chat template in [conversation.py](https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/conversation.py) and the corresponding matching function.
26
26
27
27
3.**Multimodal Data Processor**:
28
28
Define a new `Processor` class that inherits from `BaseMultimodalProcessor` and register this processor as your
@@ -35,16 +35,18 @@ standard LLM support:
35
35
expanded (if necessary) and padded with multimodal-data-hashes so that SGLang can recognize different multimodal data
36
36
with `RadixAttention`.
37
37
38
-
5.**Adapt to Vision Attention**:
38
+
5.**Handle Image Feature Extraction**:
39
+
Implement a `get_image_feature` function for your new model, which extracts image features from raw image data and converts them into the embeddings used by the language model.
40
+
41
+
6.**Adapt to Vision Attention**:
39
42
Adapt the multi-headed `Attention` of ViT with SGLang’s `VisionAttention`.
40
43
41
44
You can refer to [Qwen2VL](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/qwen2_vl.py) or
42
45
other mllm implementations. These models demonstrate how to correctly handle both multimodal and textual inputs.
43
46
44
-
You should test the new MLLM locally against Hugging Face models. See the [
45
-
`mmmu`](https://github.com/sgl-project/sglang/tree/main/benchmark/mmmu) benchmark for an example.
47
+
## Testing and Debugging
46
48
47
-
## Test the Correctness
49
+
Please note all your testing and benchmarking results in PR description.
48
50
49
51
### Interactive Debugging
50
52
@@ -65,14 +67,21 @@ should give the same text output and very similar prefill logits:
65
67
To ensure the new model is well maintained, add it to the test suite by including it in the `ALL_OTHER_MODELS` list in
66
68
the [test_generation_models.py](https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_generation_models.py)
67
69
file, test the new model on your local machine and report the results on demonstrative benchmarks (GSM8K, MMLU, MMMU,
68
-
MMMU-Pro, etc.) in your PR.
70
+
MMMU-Pro, etc.) in your PR. \\
71
+
For VLMs, also include a test in `test_vision_openai_server_{x}.py` (e.g. [test_vision_openai_server_a.py](https://github.com/sgl-project/sglang/blob/main/test/srt/test_vision_openai_server_a.py), [test_vision_openai_server_b.py](https://github.com/sgl-project/sglang/blob/main/test/srt/test_vision_openai_server_b.py)).
72
+
69
73
70
-
This is the command to test a new model on your local machine:
74
+
This is an example command to run to test a new model on your local machine:
-**(Required) MMMU**: follow MMMU benchmark [README.md](https://github.com/sgl-project/sglang/blob/main/benchmark/mmmu/README.md) to get SGLang vs. HF Transformer accuracy comparison. The accuracy score from SGLang run should not be much lower than that from HF Transformer run. Similarly, follow https://docs.sglang.ai/developer_guide/benchmark_and_profiling.html to get performance comparison: TTFT and throughput must meet or exceed baselines (e.g., HF Transformer).
83
+
-**(Optional) Other evals**: If you ran other evals, please note the results in PR description.
84
+
76
85
## Port a Model from vLLM to SGLang
77
86
78
87
The [vLLM Models Directory](https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models) is a valuable
Add to table of supported models in [generative_models.md](https://github.com/sgl-project/sglang/blob/main/docs/supported_models/generative_models.md) or [multimodal_language_models.md](https://github.com/sgl-project/sglang/blob/main/docs/supported_models/multimodal_language_models.md)
140
+
129
141
---
130
142
131
143
By following these guidelines, you can add support for new language models and multimodal large language models in
0 commit comments