Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions doc/source/models/builtin/llm/codegeex4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** THUDM/codegeex4-all-9b
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/codegeex4-all-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b>`__
- **Model ID:** zai-org/codegeex4-all-9b
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/codegeex4-all-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -37,8 +37,8 @@ Model Spec 2 (ggufv2, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** IQ2_M, IQ3_M, Q4_K_M, Q5_K_M, Q6_K_L, Q8_0
- **Engines**: vLLM, llama.cpp
- **Model ID:** THUDM/codegeex4-all-9b-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/codegeex4-all-9b-GGUF>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b-GGUF>`__
- **Model ID:** zai-org/codegeex4-all-9b-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/codegeex4-all-9b-GGUF>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b-GGUF>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
4 changes: 2 additions & 2 deletions doc/source/models/builtin/llm/cogagent.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** none
- **Engines**: Transformers
- **Model ID:** THUDM/cogagent-9b-20241220
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/cogagent-9b-20241220>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/cogagent-9b-20241220>`__
- **Model ID:** zai-org/cogagent-9b-20241220
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/cogagent-9b-20241220>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/cogagent-9b-20241220>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
2 changes: 2 additions & 0 deletions doc/source/models/builtin/llm/deepseek-v3-0324.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format awq --quantization ${quantization}


Model Spec 3 (mlx, 671 Billion)
++++++++++++++++++++++++++++++++++++++++

Expand All @@ -59,3 +60,4 @@ Execute the following command to launch the model, remember to replace ``${quant
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format mlx --quantization ${quantization}

12 changes: 6 additions & 6 deletions doc/source/models/builtin/llm/glm-4.1v-thinking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** THUDM/GLM-4.1V-9B-Thinking
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.1V-9B-Thinking>`__
- **Model ID:** zai-org/GLM-4.1V-9B-Thinking
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.1V-9B-Thinking>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -37,8 +37,8 @@ Model Spec 2 (awq, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** Int4
- **Engines**: vLLM, Transformers
- **Model ID:** dengcao/GLM-4.1V-9B-Thinking-AWQ
- **Model Hubs**: `Hugging Face <https://huggingface.co/dengcao/GLM-4.1V-9B-Thinking-AWQ>`__, `ModelScope <https://modelscope.cn/models/dengcao/GLM-4.1V-9B-Thinking-AWQ>`__
- **Model ID:** QuantTrio/GLM-4.1V-9B-Thinking-AWQ
- **Model Hubs**: `Hugging Face <https://huggingface.co/QuantTrio/GLM-4.1V-9B-Thinking-AWQ>`__, `ModelScope <https://modelscope.cn/models/tclf90/GLM-4.1V-9B-Thinking-AWQ>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -53,8 +53,8 @@ Model Spec 3 (gptq, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** Int4-Int8Mix
- **Engines**: vLLM, Transformers
- **Model ID:** dengcao/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix
- **Model Hubs**: `Hugging Face <https://huggingface.co/dengcao/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix>`__, `ModelScope <https://modelscope.cn/models/dengcao/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix>`__
- **Model ID:** QuantTrio/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix
- **Model Hubs**: `Hugging Face <https://huggingface.co/QuantTrio/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix>`__, `ModelScope <https://modelscope.cn/models/tclf90/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
111 changes: 111 additions & 0 deletions doc/source/models/builtin/llm/glm-4.5.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
.. _models_llm_glm-4.5:

========================================
glm-4.5
========================================

- **Context Length:** 65536
- **Model Name:** glm-4.5
- **Languages:** en, zh
- **Abilities:** chat, reasoning
- **Description:** The GLM-4.5 series models are foundation models designed for intelligent agents.

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 355 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 355
- **Quantizations:** none
- **Engines**: Transformers
- **Model ID:** zai-org/GLM-4.5
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/GLM-4.5>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.5>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format pytorch --quantization ${quantization}


Model Spec 2 (fp8, 355 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** fp8
- **Model Size (in billions):** 355
- **Quantizations:** FP8
- **Engines**:
- **Model ID:** zai-org/GLM-4.5-FP8
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/GLM-4.5-FP8>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.5-FP8>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format fp8 --quantization ${quantization}


Model Spec 3 (mlx, 355 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** mlx
- **Model Size (in billions):** 355
- **Quantizations:** 4bit
- **Engines**: MLX
- **Model ID:** mlx-community/GLM-4.5-{quantization}
- **Model Hubs**: `Hugging Face <https://huggingface.co/mlx-community/GLM-4.5-{quantization}>`__, `ModelScope <https://modelscope.cn/models/mlx-community/GLM-4.5-{quantization}>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format mlx --quantization ${quantization}


Model Spec 4 (pytorch, 106 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 106
- **Quantizations:** none
- **Engines**: Transformers
- **Model ID:** zai-org/GLM-4.5-Air
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/GLM-4.5-Air>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.5-Air>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format pytorch --quantization ${quantization}


Model Spec 5 (fp8, 106 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** fp8
- **Model Size (in billions):** 106
- **Quantizations:** FP8
- **Engines**:
- **Model ID:** zai-org/GLM-4.5-Air-FP8
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/GLM-4.5-Air-FP8>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.5-Air-FP8>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format fp8 --quantization ${quantization}


Model Spec 6 (mlx, 106 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** mlx
- **Model Size (in billions):** 106
- **Quantizations:** 2bit, 3bit, 4bit, 5bit, 8bit
- **Engines**: MLX
- **Model ID:** mlx-community/GLM-4.5-Air-{quantization}
- **Model Hubs**: `Hugging Face <https://huggingface.co/mlx-community/GLM-4.5-Air-{quantization}>`__, `ModelScope <https://modelscope.cn/models/mlx-community/GLM-4.5-Air-{quantization}>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format mlx --quantization ${quantization}

4 changes: 2 additions & 2 deletions doc/source/models/builtin/llm/glm-4v.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** none
- **Engines**: Transformers
- **Model ID:** THUDM/glm-4v-9b
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-4v-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4v-9b>`__
- **Model ID:** zai-org/glm-4v-9b
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-4v-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4v-9b>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
24 changes: 12 additions & 12 deletions doc/source/models/builtin/llm/glm-edge-chat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 1_5 Billion)
- **Model Size (in billions):** 1_5
- **Quantizations:** none
- **Engines**: Transformers
- **Model ID:** THUDM/glm-edge-1.5b-chat
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat>`__
- **Model ID:** zai-org/glm-edge-1.5b-chat
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-edge-1.5b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -37,8 +37,8 @@ Model Spec 2 (pytorch, 4 Billion)
- **Model Size (in billions):** 4
- **Quantizations:** none
- **Engines**: Transformers
- **Model ID:** THUDM/glm-edge-4b-chat
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat>`__
- **Model ID:** zai-org/glm-edge-4b-chat
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-edge-4b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -53,8 +53,8 @@ Model Spec 3 (ggufv2, 1_5 Billion)
- **Model Size (in billions):** 1_5
- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-1.5b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__
- **Model ID:** zai-org/glm-edge-1.5b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -69,8 +69,8 @@ Model Spec 4 (ggufv2, 1_5 Billion)
- **Model Size (in billions):** 1_5
- **Quantizations:** F16
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-1.5b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__
- **Model ID:** zai-org/glm-edge-1.5b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -85,8 +85,8 @@ Model Spec 5 (ggufv2, 4 Billion)
- **Model Size (in billions):** 4
- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-4b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__
- **Model ID:** zai-org/glm-edge-4b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -101,8 +101,8 @@ Model Spec 6 (ggufv2, 4 Billion)
- **Model Size (in billions):** 4
- **Quantizations:** F16
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-4b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__
- **Model ID:** zai-org/glm-edge-4b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
8 changes: 4 additions & 4 deletions doc/source/models/builtin/llm/glm4-0414.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** THUDM/GLM-4-9B-0414
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/GLM-4-9B-0414>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4-9B-0414>`__
- **Model ID:** zai-org/GLM-4-9B-0414
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/GLM-4-9B-0414>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4-9B-0414>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -37,8 +37,8 @@ Model Spec 2 (pytorch, 32 Billion)
- **Model Size (in billions):** 32
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** THUDM/GLM-4-32B-0414
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/GLM-4-32B-0414>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4-32B-0414>`__
- **Model ID:** zai-org/GLM-4-32B-0414
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/GLM-4-32B-0414>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4-32B-0414>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
4 changes: 2 additions & 2 deletions doc/source/models/builtin/llm/glm4-chat-1m.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** THUDM/glm-4-9b-chat-1m-hf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf>`__
- **Model ID:** zai-org/glm-4-9b-chat-1m-hf
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-4-9b-chat-1m-hf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
4 changes: 2 additions & 2 deletions doc/source/models/builtin/llm/glm4-chat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
- **Model Size (in billions):** 9
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** THUDM/glm-4-9b-chat-hf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-4-9b-chat-hf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf>`__
- **Model ID:** zai-org/glm-4-9b-chat-hf
- **Model Hubs**: `Hugging Face <https://huggingface.co/zai-org/glm-4-9b-chat-hf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
7 changes: 7 additions & 0 deletions doc/source/models/builtin/llm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,11 @@ The following is a list of built-in LLM in Xinference:
- 65536
- GLM-4.1V-9B-Thinking, designed to explore the upper limits of reasoning in vision-language models.

* - :ref:`glm-4.5 <models_llm_glm-4.5>`
- chat, reasoning
- 65536
- The GLM-4.5 series models are foundation models designed for intelligent agents.

* - :ref:`glm-4v <models_llm_glm-4v>`
- chat, vision
- 8192
Expand Down Expand Up @@ -694,6 +699,8 @@ The following is a list of built-in LLM in Xinference:

glm-4.1v-thinking

glm-4.5

glm-4v

glm-edge-chat
Expand Down
Loading
Loading