From dbce23b521bb103411002a78bc11d57dd3a5cccc Mon Sep 17 00:00:00 2001
From: OliverBryant <2713999266@qq.com>
Date: Wed, 28 Jan 2026 15:10:57 +0800
Subject: [PATCH 1/8] add v2.0 doc

---
 .../development/contributing_environment.rst  |   2 +-
 doc/source/getting_started/environments.rst   |  79 ++++-
 doc/source/getting_started/installation.rst   |   9 +-
 .../getting_started/using_docker_image.rst    |   6 +-
 .../getting_started/environments.po           | 202 ++++++++---
 .../getting_started/installation.po           | 248 +++++++------
 .../getting_started/using_docker_image.po     | 125 ++++---
 .../locale/zh_CN/LC_MESSAGES/models/custom.po | 315 ++++++++++-------
 .../zh_CN/LC_MESSAGES/models/virtualenv.po    | 325 ++++++++++++------
 .../zh_CN/LC_MESSAGES/reference/index.po      | 274 ++++++++++++---
 .../zh_CN/LC_MESSAGES/user_guide/backends.po  | 210 +++++------
 .../user_guide/distributed_inference.po       |  69 ++--
 .../zh_CN/LC_MESSAGES/user_guide/launch.po    |  88 ++++-
 doc/source/models/custom.rst                  |  32 +-
 doc/source/models/virtualenv.rst              | 116 ++++++-
 doc/source/reference/index.rst                |  34 ++
 doc/source/user_guide/backends.rst            |   3 +-
 .../user_guide/distributed_inference.rst      |   5 +
 doc/source/user_guide/launch.rst              |  43 ++-
 19 files changed, 1448 insertions(+), 737 deletions(-)
diff --git a/doc/source/development/contributing_environment.rst b/doc/source/development/contributing_environment.rst
index 8f66761857..6f4b084250 100644
--- a/doc/source/development/contributing_environment.rst
+++ b/doc/source/development/contributing_environment.rst
@@ -61,7 +61,7 @@ Conda environment. Here are the commands:
 
 ::
 
-   conda install python=3.10
+   conda install python=3.12
    conda install nodejs
 
 Install from source code
diff --git a/doc/source/getting_started/environments.rst b/doc/source/getting_started/environments.rst
index 7fe8f3be5f..c78c9bfa52 100644
--- a/doc/source/getting_started/environments.rst
+++ b/doc/source/getting_started/environments.rst
@@ -23,15 +23,20 @@ necessary files such as logs and models, where ``<HOME>`` is the home
 path of current user. You can change this directory by configuring this environment
 variable.
 
-XINFERENCE_HEALTH_CHECK_ATTEMPTS
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The number of attempts for the health check at Xinference startup, if exceeded,
-will result in an error. The default value is 3.
+XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The maximum number of failed health checks tolerated at Xinference startup.
+Default value is 5.
 
 XINFERENCE_HEALTH_CHECK_INTERVAL
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The timeout duration for the health check at Xinference startup, if exceeded,
-will result in an error. The default value is 3.
+Health check interval (seconds) at Xinference startup.
+Default value is 5.
+
+XINFERENCE_HEALTH_CHECK_TIMEOUT
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Health check timeout (seconds) at Xinference startup.
+Default value is 10.
 
 XINFERENCE_DISABLE_HEALTH_CHECK
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -43,3 +48,65 @@ XINFERENCE_DISABLE_METRICS
 Xinference will by default enable the metrics exporter on the supervisor and worker.
 Setting this environment to 1 will disable the /metrics endpoint on the supervisor
 and the HTTP service (only provide the /metrics endpoint) on the worker.
+
+XINFERENCE_DOWNLOAD_MAX_ATTEMPTS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Maximum download retry attempts for model files.
+Default value is 3.
+
+XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Enable continuous batching for text-to-image models by specifying the target image size
+(e.g., ``1024*1024``). Default is unset.
+
+XINFERENCE_SSE_PING_ATTEMPTS_SECONDS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Server-Sent Events keepalive ping interval (seconds).
+Default value is 600.
+
+XINFERENCE_MAX_TOKENS
+~~~~~~~~~~~~~~~~~~~~~
+Global max tokens limit override for requests. Default is unset.
+
+XINFERENCE_ALLOWED_IPS
+~~~~~~~~~~~~~~~~~~~~~~
+Restrict access to specified IPs or CIDR blocks. Default is unset (no restriction).
+
+XINFERENCE_BATCH_SIZE
+~~~~~~~~~~~~~~~~~~~~~
+Default batch size used by the server when batching is enabled.
+Default value is 32.
+
+XINFERENCE_BATCH_INTERVAL
+~~~~~~~~~~~~~~~~~~~~~~~~~
+Default batching interval (seconds).
+Default value is 0.003.
+
+XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Whether to allow multiple replicas on a single GPU.
+Default value is 1 (enabled).
+
+XINFERENCE_LAUNCH_STRATEGY
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+GPU allocation strategy for replicas. Default is ``IDLE_FIRST_LAUNCH_STRATEGY``.
+
+XINFERENCE_ENABLE_VIRTUAL_ENV
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Enable model virtual environments globally.
+Default value is 1 (enabled, starting from v2.0).
+
+XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Skip packages already present in system site-packages when creating virtual environments.
+Default value is 1.
+
+XINFERENCE_CSG_TOKEN
+~~~~~~~~~~~~~~~~~~~~
+Authentication token for CSGHub model source.
+Default is unset.
+
+XINFERENCE_CSG_ENDPOINT
+~~~~~~~~~~~~~~~~~~~~~~~
+CSGHub endpoint for model source.
+Default value is ``https://hub-stg.opencsg.com/``.
diff --git a/doc/source/getting_started/installation.rst b/doc/source/getting_started/installation.rst
index 94a36aec3b..433535d192 100644
--- a/doc/source/getting_started/installation.rst
+++ b/doc/source/getting_started/installation.rst
@@ -43,12 +43,18 @@ PyTorch (transformers) supports the inference of most state-of-art models. It is
 
    pip install "xinference[transformers]"
 
+Notes:
+
+- The transformers engine supports ``pytorch`` / ``gptq`` / ``awq`` / ``bnb`` / ``fp4`` formats.
+- FP4 format requires ``transformers`` with ``FPQuantConfig`` support. If you see an import error,
+  please upgrade ``transformers`` to a newer version.
+
 
 vLLM Backend
 ~~~~~~~~~~~~
 vLLM is a fast and easy-to-use library for LLM inference and serving. Xinference will choose vLLM as the backend to achieve better throughput when the following conditions are met:
 
-- The model format is ``pytorch``, ``gptq`` or ``awq``.
+- The model format is ``pytorch``, ``gptq``, ``awq``, ``fp4``, ``fp8`` or ``bnb``.
 - When the model format is ``pytorch``, the quantization is ``none``.
 - When the model format is ``awq``, the quantization is ``Int4``.
 - When the model format is ``gptq``, the quantization is ``Int3``, ``Int4`` or ``Int8``.
@@ -142,4 +148,3 @@ Other Platforms
 ~~~~~~~~~~~~~~~
 
 * :ref:`Ascend NPU <installation_npu>`
-
diff --git a/doc/source/getting_started/using_docker_image.rst b/doc/source/getting_started/using_docker_image.rst
index 44b744b29a..3e8b293736 100644
--- a/doc/source/getting_started/using_docker_image.rst
+++ b/doc/source/getting_started/using_docker_image.rst
@@ -26,9 +26,12 @@ Available tags include:
 * ``v<release version>``: This image is built each time a Xinference release version is published, and it is typically more stable.
 * ``latest``: This image is built with the latest Xinference release version.
 * For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``.
-* For CUDA 12.8, add ``-cu128`` suffix, e.g. ``nightly-main-cu128``. (Xinference version should be between v1.8.1 and v1.15.0)
 * For CUDA 12.9, add ``-cu129`` suffix, e.g. ``nightly-main-cu129``. (Xinference version should be v1.16.0 at least)
 
+.. note::
+
+   Starting from **Xinference v2.0**, only ``-cu129`` and ``-cpu`` images are officially provided.
+
 
 Dockerfile for custom build
 ===========================
@@ -96,4 +99,3 @@ at <home_path>/.cache/huggingface and <home_path>/.cache/modelscope. The command
      xprobe/xinference:v<your_version> \
      xinference-local -H 0.0.0.0
 
-
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/environments.po b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/environments.po
index e8a2a368b6..4b838dc91d 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/environments.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/environments.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2024-07-28 22:01+0800\n"
+"POT-Creation-Date: 2026-01-28 11:54+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -17,7 +17,7 @@ msgstr ""
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=utf-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"Generated-By: Babel 2.14.0\n"
+"Generated-By: Babel 2.17.0\n"
 
 #: ../../source/getting_started/environments.rst:5
 msgid "Environments Variables"
@@ -32,8 +32,8 @@ msgid ""
 "Endpoint of Xinference, used to connect to Xinference service. Default "
 "value is http://127.0.0.1:9997 , you can get it through logs."
 msgstr ""
-"Xinference 的服务地址，用来与 Xinference 连接。默认地址是 http://127.0."
-"0.1:9997，可以在日志中获得这个地址。"
+"Xinference 的服务地址，用来与 Xinference 连接。默认地址是 "
+"http://127.0.0.1:9997，可以在日志中获得这个地址。"
 
 #: ../../source/getting_started/environments.rst:13
 msgid "XINFERENCE_MODEL_SRC"
@@ -43,74 +43,190 @@ msgstr "XINFERENCE_MODEL_SRC"
 msgid ""
 "Modelhub used for downloading models. Default is \"huggingface\", or you "
 "can set \"modelscope\" as downloading source."
-msgstr ""
-"配置模型下载仓库。默认下载源是 \"huggingface\"，也可以设置为 \"modelscope"
-"\" 作为下载源。"
+msgstr "配置模型下载仓库。默认下载源是 \"huggingface\"，也可以设置为 \"modelscope\" 作为下载源。"
 
-#: ../../source/getting_started/environments.rst:18
+#: ../../source/getting_started/environments.rst:20
 msgid "XINFERENCE_HOME"
 msgstr "XINFERENCE_HOME"
 
-#: ../../source/getting_started/environments.rst:19
+#: ../../source/getting_started/environments.rst:21
 msgid ""
 "By default, Xinference uses ``<HOME>/.xinference`` as home path to store "
 "necessary files such as logs and models, where ``<HOME>`` is the home "
 "path of current user. You can change this directory by configuring this "
 "environment variable."
 msgstr ""
-"Xinference 默认使用 ``<HOME>/.xinference`` 作为默认目录来存储模型以及日志"
-"等必要的文件。其中 ``<HOME>`` 是当前用户的主目录。可以通过配置这个"
-"环境变量来修改默认目录。"
+"Xinference 默认使用 ``<HOME>/.xinference`` 作为默认目录来存储模型以及日志等必要的文件。其中 "
+"``<HOME>`` 是当前用户的主目录。可以通过配置这个环境变量来修改默认目录。"
 
-#: ../../source/getting_started/environments.rst:25
-msgid "XINFERENCE_HEALTH_CHECK_ATTEMPTS"
-msgstr "XINFERENCE_HEALTH_CHECK_ATTEMPTS"
+#: ../../source/getting_started/environments.rst:27
+msgid "XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD"
+msgstr "XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD"
 
-#: ../../source/getting_started/environments.rst:26
+#: ../../source/getting_started/environments.rst:28
 msgid ""
-"The number of attempts for the health check at Xinference startup, if "
-"exceeded, will result in an error. The default value is 3."
-msgstr ""
-"Xinference 启动时健康检查的次数，如果超过这个次数还未成功，启动会报错，"
-"默认值为 3。"
+"The maximum number of failed health checks tolerated at Xinference "
+"startup. Default value is 5."
+msgstr "Xinference启动时允许的最大健康检查失败次数。默认值为5。"
 
-#: ../../source/getting_started/environments.rst:30
+#: ../../source/getting_started/environments.rst:32
 msgid "XINFERENCE_HEALTH_CHECK_INTERVAL"
 msgstr "XINFERENCE_HEALTH_CHECK_INTERVAL"
 
-#: ../../source/getting_started/environments.rst:31
-msgid ""
-"The timeout duration for the health check at Xinference startup, if "
-"exceeded, will result in an error. The default value is 3."
-msgstr ""
-"Xinference 启动时健康检查的时间间隔，如果超过这个时间还未成功，启动会报错"
-"，默认值为 3。"
+#: ../../source/getting_started/environments.rst:33
+msgid "Health check interval (seconds) at Xinference startup. Default value is 5."
+msgstr "Xinference启动时的健康检查间隔（秒）。默认值为5。"
 
-#: ../../source/getting_started/environments.rst:35
-#, fuzzy
+#: ../../source/getting_started/environments.rst:37
+msgid "XINFERENCE_HEALTH_CHECK_TIMEOUT"
+msgstr "XINFERENCE_HEALTH_CHECK_TIMEOUT"
+
+#: ../../source/getting_started/environments.rst:38
+msgid "Health check timeout (seconds) at Xinference startup. Default value is 10."
+msgstr "Xinference启动时的健康检查超时时间（秒）。默认值为10。"
+
+#: ../../source/getting_started/environments.rst:42
 msgid "XINFERENCE_DISABLE_HEALTH_CHECK"
-msgstr "XINFERENCE_DISABLE_VLLM"
+msgstr "XINFERENCE_DISABLE_HEALTH_CHECK"
 
-#: ../../source/getting_started/environments.rst:36
+#: ../../source/getting_started/environments.rst:43
 msgid ""
 "Xinference will automatically report health check at Xinference startup. "
 "Setting this environment to 1 can disable health check."
-msgstr ""
-"在满足条件时，Xinference 会自动汇报worker健康状况，设置改环境变量为 1可以"
-"禁用健康检查。"
+msgstr "在满足条件时，Xinference 会自动汇报worker健康状况，设置改环境变量为 1可以禁用健康检查。"
 
-#: ../../source/getting_started/environments.rst:40
-#, fuzzy
+#: ../../source/getting_started/environments.rst:47
 msgid "XINFERENCE_DISABLE_METRICS"
-msgstr "XINFERENCE_DISABLE_VLLM"
+msgstr "XINFERENCE_DISABLE_METRICS"
 
-#: ../../source/getting_started/environments.rst:41
+#: ../../source/getting_started/environments.rst:48
 msgid ""
 "Xinference will by default enable the metrics exporter on the supervisor "
 "and worker. Setting this environment to 1 will disable the /metrics "
 "endpoint on the supervisor and the HTTP service (only provide the "
 "/metrics endpoint) on the worker."
 msgstr ""
-"Xinference 会默认在 supervisor 和 worker 上启用 metrics exporter。设置"
-"环境变量为 1可以在 supervisor 上禁用 /metrics 端点，并在 worker 上禁用 "
-"HTTP 服务（仅提供 /metrics 端点）"
+"Xinference 会默认在 supervisor 和 worker 上启用 metrics exporter。设置环境变量为 1可以在 "
+"supervisor 上禁用 /metrics 端点，并在 worker 上禁用 HTTP 服务（仅提供 /metrics 端点）"
+
+#: ../../source/getting_started/environments.rst:53
+msgid "XINFERENCE_DOWNLOAD_MAX_ATTEMPTS"
+msgstr "XINFERENCE_DOWNLOAD_MAX_ATTEMPTS"
+
+#: ../../source/getting_started/environments.rst:54
+msgid "Maximum download retry attempts for model files. Default value is 3."
+msgstr "模型文件的最大下载重试次数。默认值为3。"
+
+#: ../../source/getting_started/environments.rst:58
+msgid "XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE"
+msgstr "XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE"
+
+#: ../../source/getting_started/environments.rst:59
+msgid ""
+"Enable continuous batching for text-to-image models by specifying the "
+"target image size (e.g., ``1024*1024``). Default is unset."
+msgstr "通过指定目标图像尺寸（例如 ``1024*1024`` ）为文本转图像模型启用连续批处理。默认未设置。"
+
+#: ../../source/getting_started/environments.rst:63
+msgid "XINFERENCE_SSE_PING_ATTEMPTS_SECONDS"
+msgstr "XINFERENCE_SSE_PING_ATTEMPTS_SECONDS"
+
+#: ../../source/getting_started/environments.rst:64
+msgid ""
+"Server-Sent Events keepalive ping interval (seconds). Default value is "
+"600."
+msgstr "服务器发送事件保持活动状态的ping间隔（秒）。默认值为600。"
+
+#: ../../source/getting_started/environments.rst:68
+msgid "XINFERENCE_MAX_TOKENS"
+msgstr "XINFERENCE_MAX_TOKENS"
+
+#: ../../source/getting_started/environments.rst:69
+msgid "Global max tokens limit override for requests. Default is unset."
+msgstr "请求的全局最大tokens限制覆盖。默认值为未设置。"
+
+#: ../../source/getting_started/environments.rst:72
+msgid "XINFERENCE_ALLOWED_IPS"
+msgstr "XINFERENCE_ALLOWED_IPS"
+
+#: ../../source/getting_started/environments.rst:73
+msgid ""
+"Restrict access to specified IPs or CIDR blocks. Default is unset (no "
+"restriction)."
+msgstr "限制访问特定IP地址或CIDR地址块。默认未设置（无限制）。"
+
+#: ../../source/getting_started/environments.rst:76
+msgid "XINFERENCE_BATCH_SIZE"
+msgstr "XINFERENCE_BATCH_SIZE"
+
+#: ../../source/getting_started/environments.rst:77
+msgid ""
+"Default batch size used by the server when batching is enabled. Default "
+"value is 32."
+msgstr "启用批处理时服务器使用的默认批处理大小。默认值为32。"
+
+#: ../../source/getting_started/environments.rst:81
+msgid "XINFERENCE_BATCH_INTERVAL"
+msgstr "XINFERENCE_BATCH_INTERVAL"
+
+#: ../../source/getting_started/environments.rst:82
+msgid "Default batching interval (seconds). Default value is 0.003."
+msgstr "默认批处理间隔（秒）。默认值为0.003。"
+
+#: ../../source/getting_started/environments.rst:86
+msgid "XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU"
+msgstr "XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU"
+
+#: ../../source/getting_started/environments.rst:87
+msgid ""
+"Whether to allow multiple replicas on a single GPU. Default value is 1 "
+"(enabled)."
+msgstr "是否允许在单个GPU上创建多个副本。默认值为1 (启用)。"
+
+#: ../../source/getting_started/environments.rst:91
+msgid "XINFERENCE_LAUNCH_STRATEGY"
+msgstr "XINFERENCE_LAUNCH_STRATEGY"
+
+#: ../../source/getting_started/environments.rst:92
+msgid ""
+"GPU allocation strategy for replicas. Default is "
+"``IDLE_FIRST_LAUNCH_STRATEGY``."
+msgstr "副本的GPU分配策略。默认值为 ``IDLE_FIRST_LAUNCH_STRATEGY`` 。"
+
+#: ../../source/getting_started/environments.rst:95
+msgid "XINFERENCE_ENABLE_VIRTUAL_ENV"
+msgstr "XINFERENCE_ENABLE_VIRTUAL_ENV"
+
+#: ../../source/getting_started/environments.rst:96
+msgid ""
+"Enable model virtual environments globally. Default value is 1 (enabled, "
+"starting from v2.0)."
+msgstr "全局启用模型虚拟环境。默认值为1（启用，自v2.0版本生效）"
+
+#: ../../source/getting_started/environments.rst:100
+msgid "XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED"
+msgstr "XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED"
+
+#: ../../source/getting_started/environments.rst:101
+msgid ""
+"Skip packages already present in system site-packages when creating "
+"virtual environments. Default value is 1."
+msgstr "创建虚拟环境时跳过系统site-packages中已存在的包。默认值为1。"
+
+#: ../../source/getting_started/environments.rst:105
+msgid "XINFERENCE_CSG_TOKEN"
+msgstr "XINFERENCE_CSG_TOKEN"
+
+#: ../../source/getting_started/environments.rst:106
+msgid "Authentication token for CSGHub model source. Default is unset."
+msgstr "CSGHub模型源的认证令牌。默认值为未设置。"
+
+#: ../../source/getting_started/environments.rst:110
+msgid "XINFERENCE_CSG_ENDPOINT"
+msgstr "XINFERENCE_CSG_ENDPOINT"
+
+#: ../../source/getting_started/environments.rst:111
+msgid ""
+"CSGHub endpoint for model source. Default value is ``https://hub-"
+"stg.opencsg.com/``."
+msgstr "CSGHub 模型源端点。默认值为 ``https://hub-stg.opencsg.com/`` 。"
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/installation.po b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/installation.po
index da106cbb00..2f7f67cb4c 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/installation.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/installation.po
@@ -7,7 +7,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2025-07-30 11:01+0800\n"
+"POT-Creation-Date: 2026-01-28 11:54+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -28,8 +28,8 @@ msgid ""
 " run models using Xinference, you will need to install the backend "
 "corresponding to the type of model you intend to serve."
 msgstr ""
-"Xinference 在 Linux, Windows, MacOS 上都可以通过 ``pip`` 来安装。如果需要"
-"使用 Xinference 进行模型推理，可以根据不同的模型指定不同的引擎。"
+"Xinference 在 Linux, Windows, MacOS 上都可以通过 ``pip`` 来安装。如果需要使用 Xinference "
+"进行模型推理，可以根据不同的模型指定不同的引擎。"
 
 #: ../../source/getting_started/installation.rst:8
 msgid ""
@@ -44,8 +44,8 @@ msgid ""
 "sglang, please install it separately via ``pip install "
 "'xinference[sglang]'``."
 msgstr ""
-"由于 vllm 和 sglang 在包依赖上无法调和，因此，我们从 all 里移除了 sglang"
-"，如果要使用 sglang，请使用 ``pip install 'xinference[sglang]'`` 。"
+"由于 vllm 和 sglang 在包依赖上无法调和，因此，我们从 all 里移除了 sglang，如果要使用 sglang，请使用 ``pip "
+"install 'xinference[sglang]'`` 。"
 
 #: ../../source/getting_started/installation.rst:17
 msgid "Several usage scenarios require special attention."
@@ -60,9 +60,7 @@ msgid ""
 "In this situation, it's advised to install its dependencies manually "
 "based on your hardware specifications to enable acceleration. For more "
 "details, see the :ref:`installation_gguf` section."
-msgstr ""
-"在这种情况下，建议根据您的硬件规格手动安装其依赖项以启用加速。更多详情请"
-"参见 :ref:`installation_gguf` 部分。"
+msgstr "在这种情况下，建议根据您的硬件规格手动安装其依赖项以启用加速。更多详情请参见 :ref:`installation_gguf` 部分。"
 
 #: ../../source/getting_started/installation.rst:23
 msgid "**AWQ or GPTQ** format with **transformers engine**"
@@ -76,17 +74,15 @@ msgstr "**本节内容新增于 v1.6.0。**"
 msgid ""
 "This is because the dependencies at this stage require special options "
 "and are difficult to install. Please run command below in advance"
-msgstr ""
-"这是因为此阶段的依赖项需要特殊选项，并且安装起来比较困难。请提前运行以下"
-"命令"
+msgstr "这是因为此阶段的依赖项需要特殊选项，并且安装起来比较困难。请提前运行以下命令"
 
 #: ../../source/getting_started/installation.rst:33
 msgid ""
 "Some dependencies like ``transformers`` might be downgraded, you can run "
 "``pip install \"xinference[all]\"`` afterwards."
 msgstr ""
-"某些依赖项，如 ``transformers``，可能会被降级，您可以之后运行 ``pip "
-"install \"xinference[all]\"``。"
+"某些依赖项，如 ``transformers``，可能会被降级，您可以之后运行 ``pip install "
+"\"xinference[all]\"``。"
 
 #: ../../source/getting_started/installation.rst:36
 msgid ""
@@ -102,118 +98,110 @@ msgstr "Transformers 引擎"
 msgid ""
 "PyTorch (transformers) supports the inference of most state-of-art "
 "models. It is the default backend for models in PyTorch format::"
-msgstr ""
-"PyTorch(transformers) 引擎支持几乎有所的最新模型，这是 Pytorch 模型默认"
-"使用的引擎："
+msgstr "PyTorch(transformers) 引擎支持几乎有所的最新模型，这是 Pytorch 模型默认使用的引擎："
+
+#: ../../source/getting_started/installation.rst:46
+msgid "Notes:"
+msgstr "注意："
 
 #: ../../source/getting_started/installation.rst:48
+msgid ""
+"The transformers engine supports ``pytorch`` / ``gptq`` / ``awq`` / "
+"``bnb`` / ``fp4`` formats."
+msgstr "Transformers引擎支持 ``pytorch`` / ``gptq`` / ``awq`` / ``bnb`` / ``fp4`` 格式。"
+
+#: ../../source/getting_started/installation.rst:49
+msgid ""
+"FP4 format requires ``transformers`` with ``FPQuantConfig`` support. If "
+"you see an import error, please upgrade ``transformers`` to a newer "
+"version."
+msgstr "FP4格式需要支持FPQuantConfig的transformers库。若遇到导入错误，请将transformers升级至新版本。"
+
+#: ../../source/getting_started/installation.rst:54
 msgid "vLLM Backend"
 msgstr "vLLM 引擎"
 
-#: ../../source/getting_started/installation.rst:49
+#: ../../source/getting_started/installation.rst:55
 msgid ""
 "vLLM is a fast and easy-to-use library for LLM inference and serving. "
 "Xinference will choose vLLM as the backend to achieve better throughput "
 "when the following conditions are met:"
-msgstr ""
-"vLLM 是一个支持高并发的高性能大模型推理引擎。当满足以下条件时，Xinference"
-" 会自动选择 vllm 作为引擎来达到更高的吞吐量："
+msgstr "vLLM 是一个支持高并发的高性能大模型推理引擎。当满足以下条件时，Xinference 会自动选择 vllm 作为引擎来达到更高的吞吐量："
 
-#: ../../source/getting_started/installation.rst:51
-msgid "The model format is ``pytorch``, ``gptq`` or ``awq``."
-msgstr "模型格式为 ``pytorch`` ， ``gptq`` 或者 ``awq`` 。"
+#: ../../source/getting_started/installation.rst:57
+msgid ""
+"The model format is ``pytorch``, ``gptq``, ``awq``, ``fp4``, ``fp8`` or "
+"``bnb``."
+msgstr "模型格式为 ``pytorch`` ， ``gptq`` ， ``awq`` ， ``fp4`` ， ``fp8`` 或者 ``bnb`` 。"
 
-#: ../../source/getting_started/installation.rst:52
+#: ../../source/getting_started/installation.rst:58
 msgid "When the model format is ``pytorch``, the quantization is ``none``."
 msgstr "当模型格式为 ``pytorch`` 时，量化选项需为 ``none`` 。"
 
-#: ../../source/getting_started/installation.rst:53
+#: ../../source/getting_started/installation.rst:59
 msgid "When the model format is ``awq``, the quantization is ``Int4``."
 msgstr "当模型格式为 ``awq`` 时，量化选项需为 ``Int4`` 。"
 
-#: ../../source/getting_started/installation.rst:54
+#: ../../source/getting_started/installation.rst:60
 msgid ""
 "When the model format is ``gptq``, the quantization is ``Int3``, ``Int4``"
 " or ``Int8``."
-msgstr ""
-"当模型格式为 ``gptq`` 时，量化选项需为 ``Int3`` 、 ``Int4`` 或者 ``Int8``"
-" 。"
+msgstr "当模型格式为 ``gptq`` 时，量化选项需为 ``Int3`` 、 ``Int4`` 或者 ``Int8`` 。"
 
-#: ../../source/getting_started/installation.rst:55
+#: ../../source/getting_started/installation.rst:61
 msgid "The system is Linux and has at least one CUDA device"
 msgstr "操作系统为 Linux 并且至少有一个支持 CUDA 的设备"
 
-#: ../../source/getting_started/installation.rst:56
+#: ../../source/getting_started/installation.rst:62
 msgid ""
 "The model family (for custom models) / model name (for builtin models) is"
 " within the list of models supported by vLLM"
-msgstr ""
-"自定义模型的 ``model_family`` 字段和内置模型的 ``model_name`` 字段在 vLLM"
-" 的支持列表中。"
+msgstr "自定义模型的 ``model_family`` 字段和内置模型的 ``model_name`` 字段在 vLLM 的支持列表中。"
 
-#: ../../source/getting_started/installation.rst:58
+#: ../../source/getting_started/installation.rst:64
 msgid "Currently, supported models include:"
 msgstr "目前，支持的模型包括："
 
-#: ../../source/getting_started/installation.rst:62
-msgid ""
-"``llama-2``, ``llama-3``, ``llama-3.1``, ``llama-3.2-vision``, "
-"``llama-2-chat``, ``llama-3-instruct``, ``llama-3.1-instruct``, "
-"``llama-3.3-instruct``"
-msgstr ""
-
-#: ../../source/getting_started/installation.rst:63
-msgid ""
-"``mistral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``, "
-"``mistral-instruct-v0.3``, ``mistral-nemo-instruct``, ``mistral-large-"
-"instruct``"
-msgstr ""
-
-#: ../../source/getting_started/installation.rst:64
-msgid "``codestral-v0.1``"
-msgstr ""
-
-#: ../../source/getting_started/installation.rst:65
-msgid "``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``"
-msgstr ""
-
-#: ../../source/getting_started/installation.rst:66
-msgid "``code-llama``, ``code-llama-python``, ``code-llama-instruct``"
-msgstr ""
-
-#: ../../source/getting_started/installation.rst:67
-msgid ""
-"``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-"
-"instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, "
-"``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, "
-"``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-"
-"prover-v2``, ``deepseek-r1-0528-qwen3``, ``deepseek-r1-distill-llama``"
-msgstr ""
-
 #: ../../source/getting_started/installation.rst:68
-msgid "``yi-coder``, ``yi-coder-chat``"
+msgid ""
+"``code-llama``, ``code-llama-instruct``, ``code-llama-python``, "
+"``deepseek``, ``deepseek-chat``, ``deepseek-coder``, ``deepseek-coder-"
+"instruct``, ``deepseek-r1-distill-llama``, ``gorilla-openfunctions-v2``, "
+"``HuatuoGPT-o1-LLaMA-3.1``, ``llama-2``, ``llama-2-chat``, ``llama-3``, "
+"``llama-3-instruct``, ``llama-3.1``, ``llama-3.1-instruct``, "
+"``llama-3.3-instruct``, ``tiny-llama``, ``wizardcoder-python-v1.0``, "
+"``wizardmath-v1.0``, ``Yi``, ``Yi-1.5``, ``Yi-1.5-chat``, ``Yi-1.5-chat-"
+"16k``, ``Yi-200k``, ``Yi-chat``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:69
-msgid "``codeqwen1.5``, ``codeqwen1.5-chat``"
+msgid ""
+"``codestral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``,"
+" ``mistral-instruct-v0.3``, ``mistral-large-instruct``, ``mistral-nemo-"
+"instruct``, ``mistral-v0.1``, ``openhermes-2.5``, ``seallm_v2``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:70
 msgid ""
-"``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-"
-"instruct``, ``qwen2.5-instruct-1m``"
+"``Baichuan-M2``, ``codeqwen1.5``, ``codeqwen1.5-chat``, ``deepseek-r1"
+"-distill-qwen``, ``DianJin-R1``, ``fin-r1``, ``HuatuoGPT-o1-Qwen2.5``, "
+"``KAT-V1``, ``marco-o1``, ``qwen1.5-chat``, ``qwen2-instruct``, "
+"``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-coder-instruct``, "
+"``qwen2.5-instruct``, ``qwen2.5-instruct-1m``, ``qwenLong-l1``, ``QwQ-"
+"32B``, ``QwQ-32B-Preview``, ``seallms-v3``, ``skywork-or1``, ``skywork-"
+"or1-preview``, ``XiYanSQL-QwenCoder-2504``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:71
-msgid "``baichuan-2-chat``"
+msgid "``llama-3.2-vision``, ``llama-3.2-vision-instruct``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:72
-msgid "``internlm2-chat``"
+msgid "``baichuan-2``, ``baichuan-2-chat``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:73
-msgid "``internlm2.5-chat``, ``internlm2.5-chat-1m``"
+msgid "``InternLM2ForCausalLM``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:74
@@ -221,172 +209,176 @@ msgid "``qwen-chat``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:75
-msgid "``mixtral-instruct-v0.1``, ``mixtral-8x22B-instruct-v0.1``"
+msgid ""
+"``mixtral-8x22B-instruct-v0.1``, ``mixtral-instruct-v0.1``, "
+"``mixtral-v0.1``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:76
-msgid "``chatglm3``, ``chatglm3-32k``, ``chatglm3-128k``"
+msgid "``cogagent``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:77
-msgid "``glm4-chat``, ``glm4-chat-1m``, ``glm4-0414``"
+msgid "``glm-edge-chat``, ``glm4-chat``, ``glm4-chat-1m``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:78
-msgid "``codegeex4``"
+msgid "``codegeex4``, ``glm-4v``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:79
-msgid "``qwen1.5-chat``, ``qwen1.5-moe-chat``"
+msgid "``seallm_v2.5``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:80
-msgid "``qwen2-instruct``, ``qwen2-moe-instruct``"
+msgid "``orion-chat``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:81
-msgid "``XiYanSQL-QwenCoder-2504``"
+msgid "``qwen1.5-moe-chat``, ``qwen2-moe-instruct``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:82
-msgid "``QwQ-32B-Preview``, ``QwQ-32B``"
+msgid "``CohereForCausalLM``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:83
-msgid "``marco-o1``"
+msgid ""
+"``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, "
+"``deepseek-vl2``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:84
-msgid "``fin-r1``"
+msgid ""
+"``deepseek-prover-v2``, ``deepseek-r1``, ``deepseek-r1-0528``, "
+"``deepseek-v3``, ``deepseek-v3-0324``, ``Deepseek-V3.1``, ``moonlight-"
+"16b-a3b-instruct``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:85
-msgid "``seallms-v3``"
+msgid "``deepseek-r1-0528-qwen3``, ``qwen3``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:86
-msgid "``skywork-or1-preview``, ``skywork-or1``"
+msgid "``minicpm3-4b``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:87
-msgid "``HuatuoGPT-o1-Qwen2.5``, ``HuatuoGPT-o1-LLaMA-3.1``"
+msgid "``internlm3-instruct``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:88
-msgid "``DianJin-R1``"
+msgid "``gemma-3-1b-it``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:89
-msgid "``gemma-it``, ``gemma-2-it``, ``gemma-3-1b-it``"
+msgid "``glm4-0414``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:90
-msgid "``orion-chat``, ``orion-chat-rag``"
+msgid ""
+"``minicpm-2b-dpo-bf16``, ``minicpm-2b-dpo-fp16``, ``minicpm-2b-dpo-"
+"fp32``, ``minicpm-2b-sft-bf16``, ``minicpm-2b-sft-fp32``, ``minicpm4``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:91
-msgid "``c4ai-command-r-v01``"
+msgid "``Ernie4.5``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:92
-msgid "``minicpm3-4b``"
+msgid "``Qwen3-Coder``, ``Qwen3-Instruct``, ``Qwen3-Thinking``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:93
-msgid "``internlm3-instruct``"
+msgid "``glm-4.5``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:94
-msgid "``moonlight-16b-a3b-instruct``"
+msgid "``gpt-oss``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:95
-msgid "``qwenLong-l1``"
+msgid "``seed-oss``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:96
-msgid "``qwen3``"
+msgid "``Qwen3-Next-Instruct``, ``Qwen3-Next-Thinking``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:97
-msgid "``minicpm4``"
+msgid "``DeepSeek-V3.2``, ``DeepSeek-V3.2-Exp``"
 msgstr ""
 
 #: ../../source/getting_started/installation.rst:98
-msgid "``Ernie4.5``"
-msgstr ""
-
-#: ../../source/getting_started/installation.rst:99
-msgid "``Qwen3-Instruct``"
+msgid "``MiniMax-M2``"
 msgstr ""
 
-#: ../../source/getting_started/installation.rst:102
+#: ../../source/getting_started/installation.rst:101
 msgid "To install Xinference and vLLM::"
 msgstr "安装 xinference 和 vLLM："
 
-#: ../../source/getting_started/installation.rst:115
+#: ../../source/getting_started/installation.rst:114
 msgid "Llama.cpp Backend"
 msgstr "Llama.cpp 引擎"
 
-#: ../../source/getting_started/installation.rst:116
+#: ../../source/getting_started/installation.rst:115
 msgid ""
 "Xinference supports models in ``gguf`` format via ``xllamacpp``. "
 "`xllamacpp <https://github.com/xorbitsai/xllamacpp>`_ is developed by "
 "Xinference team, and is the sole backend for llama.cpp since v1.6.0."
 msgstr ""
-"Xinference 通过 xllamacpp 支持 gguf 格式的模型。`xllamacpp <https://"
-"github.com/xorbitsai/xllamacpp>`_ 由 Xinference 团队开发，并从 v1.6.0 "
+"Xinference 通过 xllamacpp 支持 gguf 格式的模型。`xllamacpp "
+"<https://github.com/xorbitsai/xllamacpp>`_ 由 Xinference 团队开发，并从 v1.6.0 "
 "开始成为 llama.cpp 的唯一后端。"
 
-#: ../../source/getting_started/installation.rst:122
+#: ../../source/getting_started/installation.rst:121
 msgid ""
 "Since Xinference v1.5.0, ``llama-cpp-python`` is deprecated. Since "
 "Xinference v1.6.0, ``llama-cpp-python`` has been removed."
 msgstr ""
-"自 Xinference v1.5.0 起，``llama-cpp-python`` 被弃用；在 Xinference 从 "
-"v1.6.0 开始，该后端已被移除。"
+"自 Xinference v1.5.0 起，``llama-cpp-python`` 被弃用；在 Xinference 从 v1.6.0 "
+"开始，该后端已被移除。"
 
-#: ../../source/getting_started/installation.rst:125
-#: ../../source/getting_started/installation.rst:135
-#: ../../source/getting_started/installation.rst:144
+#: ../../source/getting_started/installation.rst:124
+#: ../../source/getting_started/installation.rst:134
+#: ../../source/getting_started/installation.rst:143
 msgid "Initial setup::"
 msgstr "初始步骤："
 
-#: ../../source/getting_started/installation.rst:129
+#: ../../source/getting_started/installation.rst:128
 msgid ""
 "For more installation instructions for ``xllamacpp`` to enable GPU "
 "acceleration, please refer to: https://github.com/xorbitsai/xllamacpp"
 msgstr ""
-"更多的 ``xllamacpp`` 安装说明以便开启 GPU 加速，请参考：https://github.com"
-"/xorbitsai/xllamacpp"
+"更多的 ``xllamacpp`` 安装说明以便开启 GPU "
+"加速，请参考：https://github.com/xorbitsai/xllamacpp"
 
-#: ../../source/getting_started/installation.rst:132
+#: ../../source/getting_started/installation.rst:131
 msgid "SGLang Backend"
 msgstr "SGLang 引擎"
 
-#: ../../source/getting_started/installation.rst:133
+#: ../../source/getting_started/installation.rst:132
 msgid ""
 "SGLang has a high-performance inference runtime with RadixAttention. It "
 "significantly accelerates the execution of complex LLM programs by "
 "automatic KV cache reuse across multiple calls. And it also supports "
 "other common techniques like continuous batching and tensor parallelism."
 msgstr ""
-"SGLang 具有基于 RadixAttention 的高性能推理运行时。它通过在多个调用之间"
-"自动重用KV缓存，显著加速了复杂 LLM 程序的执行。它还支持其他常见推理技术，"
-"如连续批处理和张量并行处理。"
+"SGLang 具有基于 RadixAttention 的高性能推理运行时。它通过在多个调用之间自动重用KV缓存，显著加速了复杂 LLM "
+"程序的执行。它还支持其他常见推理技术，如连续批处理和张量并行处理。"
 
-#: ../../source/getting_started/installation.rst:141
+#: ../../source/getting_started/installation.rst:140
 msgid "MLX Backend"
 msgstr "MLX 引擎"
 
-#: ../../source/getting_started/installation.rst:142
+#: ../../source/getting_started/installation.rst:141
 msgid "MLX-lm is designed for Apple silicon users to run LLM efficiently."
 msgstr "MLX-lm 用来在苹果 silicon 芯片上提供高效的 LLM 推理。"
 
-#: ../../source/getting_started/installation.rst:149
+#: ../../source/getting_started/installation.rst:148
 msgid "Other Platforms"
 msgstr "其他平台"
 
-#: ../../source/getting_started/installation.rst:151
+#: ../../source/getting_started/installation.rst:150
 msgid ":ref:`Ascend NPU <installation_npu>`"
 msgstr ""
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
index 8f7592d944..805aae4904 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2025-12-29 11:34+0800\n"
+"POT-Creation-Date: 2026-01-28 11:54+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -17,7 +17,7 @@ msgstr ""
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=utf-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"Generated-By: Babel 2.14.0\n"
+"Generated-By: Babel 2.17.0\n"
 
 #: ../../source/getting_started/using_docker_image.rst:5
 msgid "Xinference Docker Image"
@@ -35,9 +35,7 @@ msgstr "准备工作"
 msgid ""
 "The image can only run in an environment with GPUs and CUDA installed, "
 "because Xinference in the image relies on Nvidia GPUs for acceleration."
-msgstr ""
-"Xinference 使用 GPU 加速推理，该镜像需要在有 GPU 显卡并且安装 CUDA 的机器"
-"上运行。"
+msgstr "Xinference 使用 GPU 加速推理，该镜像需要在有 GPU 显卡并且安装 CUDA 的机器上运行。"
 
 #: ../../source/getting_started/using_docker_image.rst:13
 msgid ""
@@ -52,9 +50,8 @@ msgid ""
 "and the CUDA version on the host machine should be ``12.4`` or above, and"
 " the NVIDIA driver version should be ``550`` or above."
 msgstr ""
-"对于 CUDA 版本小于 12.8，镜像中的 CUDA 版本为 ``12.4`` 。为了不出现预期"
-"之外的问题，请将宿主机的 CUDA 版本和 NVIDIA Driver 版本分别升级到 ``12.4`"
-"` 和 ``550`` 以上。"
+"对于 CUDA 版本小于 12.8，镜像中的 CUDA 版本为 ``12.4`` 。为了不出现预期之外的问题，请将宿主机的 CUDA 版本和 "
+"NVIDIA Driver 版本分别升级到 ``12.4`` 和 ``550`` 以上。"
 
 #: ../../source/getting_started/using_docker_image.rst:15
 msgid ""
@@ -62,8 +59,8 @@ msgid ""
 "``12.8``, and the CUDA version on the host machine should be ``12.8`` or "
 "above, and the NVIDIA driver version should be ``570`` or above."
 msgstr ""
-"对于 CUDA 版本 >= 12.8 且 < 12.9，Docker 镜像中使用的 CUDA 版本为 ``12.8``。"
-"宿主机上的 CUDA 版本需为 ``12.8`` 或以上，同时 NVIDIA 驱动版本需为 ``570`` 或以上。"
+"对于 CUDA 版本 >= 12.8 且 < 12.9，Docker 镜像中使用的 CUDA 版本为 ``12.8``。宿主机上的 CUDA "
+"版本需为 ``12.8`` 或以上，同时 NVIDIA 驱动版本需为 ``570`` 或以上。"
 
 #: ../../source/getting_started/using_docker_image.rst:16
 msgid ""
@@ -71,16 +68,16 @@ msgid ""
 "and the CUDA version on the host machine should be ``12.9`` or above, and"
 " the NVIDIA driver version should be ``575`` or above."
 msgstr ""
-"对于 CUDA 版本 >= 12.9，Docker 镜像中使用的 CUDA 版本为 ``12.9``。"
-"宿主机上的 CUDA 版本需为 ``12.9`` 或以上，同时 NVIDIA 驱动版本需为 ``575`` 或以上。"
+"对于 CUDA 版本 >= 12.9，Docker 镜像中使用的 CUDA 版本为 ``12.9``。宿主机上的 CUDA 版本需为 "
+"``12.9`` 或以上，同时 NVIDIA 驱动版本需为 ``575`` 或以上。"
 
 #: ../../source/getting_started/using_docker_image.rst:17
 msgid ""
 "Ensure `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter"
 "/cloud-native/container-toolkit/latest/install-guide.html>`_ installed."
 msgstr ""
-"请确保已安装 `NVIDIA Container Toolkit <https://docs.nvidia.com/"
-"datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_ 。"
+"请确保已安装 `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter"
+"/cloud-native/container-toolkit/latest/install-guide.html>`_ 。"
 
 #: ../../source/getting_started/using_docker_image.rst:21
 msgid "Docker Image"
@@ -90,26 +87,20 @@ msgstr "Docker 镜像"
 msgid ""
 "The official image of Xinference is available on DockerHub in the "
 "repository ``xprobe/xinference``. Available tags include:"
-msgstr ""
-"Xinference 官方镜像已发布在 DockerHub 上的 ``xprobe/xinference`` 仓库中。"
-"当前可用的标签包括："
+msgstr "Xinference 官方镜像已发布在 DockerHub 上的 ``xprobe/xinference`` 仓库中。当前可用的标签包括："
 
 #: ../../source/getting_started/using_docker_image.rst:25
 msgid ""
 "``nightly-main``: This image is built daily from the `GitHub main branch "
 "<https://github.com/xorbitsai/inference>`_ and generally does not "
 "guarantee stability."
-msgstr ""
-"``nightly-main``: 这个镜像会每天从 GitHub main 分支更新制作，不保证稳定"
-"可靠。"
+msgstr "``nightly-main``: 这个镜像会每天从 GitHub main 分支更新制作，不保证稳定可靠。"
 
 #: ../../source/getting_started/using_docker_image.rst:26
 msgid ""
 "``v<release version>``: This image is built each time a Xinference "
 "release version is published, and it is typically more stable."
-msgstr ""
-"``v<release version>``: 这个镜像会在 Xinference 每次发布的时候制作，通常"
-"可以认为是稳定可靠的。"
+msgstr "``v<release version>``: 这个镜像会在 Xinference 每次发布的时候制作，通常可以认为是稳定可靠的。"
 
 #: ../../source/getting_started/using_docker_image.rst:27
 msgid ""
@@ -123,25 +114,23 @@ msgstr "对于 CPU 版本，增加 ``-cpu`` 后缀，如 ``nightly-main-cpu``。
 
 #: ../../source/getting_started/using_docker_image.rst:29
 msgid ""
-"For CUDA 12.8, add ``-cu128`` suffix, e.g. ``nightly-main-cu128``. "
-"(Xinference version should be between v1.8.1 and v1.15.0)"
+"For CUDA 12.9, add ``-cu129`` suffix, e.g. ``nightly-main-cu129``. "
+"(Xinference version should be v1.16.0 at least)"
 msgstr ""
-"对于 CUDA 12.8 版本，增加 ``-cu128`` 后缀，如 ``nightly-main-cu128`` 。（"
-"Xinference 版本需要介于 v1.8.1 和 v1.15.0）"
+"对于 CUDA 12.9 版本，增加 ``-cu129`` 后缀，如 ``nightly-main-cu129`` 。（Xinference "
+"版本需要至少 v1.16.0）"
 
-#: ../../source/getting_started/using_docker_image.rst:30
+#: ../../source/getting_started/using_docker_image.rst:33
 msgid ""
-"For CUDA 12.9, add ``-cu129`` suffix, e.g. ``nightly-main-cu129``. "
-"(Xinference version should be v1.16.0 at least)"
+"Starting from **Xinference v2.0**, only ``-cu129`` and ``-cpu`` images "
+"are officially provided."
 msgstr ""
-"对于 CUDA 12.9 版本，增加 ``-cu129`` 后缀，如 ``nightly-main-cu129`` 。（"
-"Xinference 版本需要至少 v1.16.0）"
 
-#: ../../source/getting_started/using_docker_image.rst:34
+#: ../../source/getting_started/using_docker_image.rst:37
 msgid "Dockerfile for custom build"
 msgstr "自定义镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:35
+#: ../../source/getting_started/using_docker_image.rst:38
 msgid ""
 "If you need to build the Xinference image according to your own "
 "requirements, the source code for the Dockerfile is located at "
@@ -150,63 +139,60 @@ msgid ""
 " for reference. Please make sure to be in the top-level directory of "
 "Xinference when using this Dockerfile. For example:"
 msgstr ""
-"如果需要安装额外的依赖，可以参考 `xinference/deploy/docker/Dockerfile <"
-"https://github.com/xorbitsai/inference/tree/main/xinference/deploy/docker"
-"/Dockerfile>`_ 。请确保使用 Dockerfile 制作镜像时在 Xinference 项目的"
-"根目录下。比如："
+"如果需要安装额外的依赖，可以参考 `xinference/deploy/docker/Dockerfile "
+"<https://github.com/xorbitsai/inference/tree/main/xinference/deploy/docker/Dockerfile>`_"
+" 。请确保使用 Dockerfile 制作镜像时在 Xinference 项目的根目录下。比如："
 
-#: ../../source/getting_started/using_docker_image.rst:46
+#: ../../source/getting_started/using_docker_image.rst:49
 msgid "Image usage"
 msgstr "使用镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:47
+#: ../../source/getting_started/using_docker_image.rst:50
 msgid ""
 "You can start Xinference in the container like this, simultaneously "
 "mapping port 9997 in the container to port 9998 on the host, enabling "
 "debug logging, and downloading models from modelscope."
 msgstr ""
-"你可以使用如下方式在容器内启动 Xinference，同时将 9997 端口映射到宿主机的"
-" 9998 端口，并且指定日志级别为 DEBUG，也可以指定需要的环境变量。"
+"你可以使用如下方式在容器内启动 Xinference，同时将 9997 端口映射到宿主机的 9998 端口，并且指定日志级别为 "
+"DEBUG，也可以指定需要的环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:55
+#: ../../source/getting_started/using_docker_image.rst:58
 msgid ""
 "The option ``--gpus`` is essential and cannot be omitted, because as "
 "mentioned earlier, the image requires the host machine to have a GPU. "
 "Otherwise, errors will occur."
-msgstr ""
-"``--gpus`` 必须指定，正如前文描述，镜像必须运行在有 GPU 的机器上，否则会"
-"出现错误。"
+msgstr "``--gpus`` 必须指定，正如前文描述，镜像必须运行在有 GPU 的机器上，否则会出现错误。"
 
-#: ../../source/getting_started/using_docker_image.rst:56
+#: ../../source/getting_started/using_docker_image.rst:59
 msgid ""
 "The ``-H 0.0.0.0`` parameter after the ``xinference-local`` command "
 "cannot be omitted. Otherwise, the host machine may not be able to access "
 "the port inside the container."
 msgstr "``-H 0.0.0.0`` 也是必须指定的，否则在容器外无法连接到 Xinference 服务。"
 
-#: ../../source/getting_started/using_docker_image.rst:57
+#: ../../source/getting_started/using_docker_image.rst:60
 msgid ""
 "You can add multiple ``-e`` options to introduce multiple environment "
 "variables."
 msgstr "可以指定多个 ``-e`` 选项赋值多个环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:60
+#: ../../source/getting_started/using_docker_image.rst:63
 msgid ""
 "Certainly, if you prefer, you can also manually enter the docker "
 "container and start Xinference in any desired way."
 msgstr "当然，也可以运行容器后，进入容器内手动拉起 Xinference。"
 
-#: ../../source/getting_started/using_docker_image.rst:64
+#: ../../source/getting_started/using_docker_image.rst:67
 msgid ""
 "For multiple GPUs, make sure to set the shared memory size, for example: "
 "`docker run --shm-size=128g ...`"
 msgstr "对于多张 GPU，确保设置共享内存大小，例如：`docker run --shm-size=128g ...`"
 
-#: ../../source/getting_started/using_docker_image.rst:68
+#: ../../source/getting_started/using_docker_image.rst:71
 msgid "Mount your volume for loading and saving models"
 msgstr "挂载模型目录"
 
-#: ../../source/getting_started/using_docker_image.rst:69
+#: ../../source/getting_started/using_docker_image.rst:72
 msgid ""
 "The image does not contain any model files by default, and it downloads "
 "the models into the container. Typically, you would need to mount a "
@@ -215,11 +201,10 @@ msgid ""
 "need to specify a volume when running the Docker image and configure "
 "environment variables for Xinference:"
 msgstr ""
-"默认情况下，镜像中不包含任何模型文件，使用过程中会在容器内下载模型。如果"
-"需要使用已经下载好的模型，需要将宿主机的目录挂载到容器内。这种情况下，"
-"需要在运行容器时指定本地卷，并且为 Xinference 配置环境变量。"
+"默认情况下，镜像中不包含任何模型文件，使用过程中会在容器内下载模型。如果需要使用已经下载好的模型，需要将宿主机的目录挂载到容器内。这种情况下，需要在运行容器时指定本地卷，并且为"
+" Xinference 配置环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:78
+#: ../../source/getting_started/using_docker_image.rst:81
 msgid ""
 "The principle behind the above command is to mount the specified "
 "directory from the host machine into the container, and then set the "
@@ -230,12 +215,11 @@ msgid ""
 "time you run it, you can directly use the existing models without the "
 "need for repetitive downloads."
 msgstr ""
-"上述命令的原理是将主机上指定的目录挂载到容器中，并设置 ``XINFERENCE_HOME`"
-"` 环境变量指向容器内的该目录。这样，所有下载的模型文件将存储在您在主机上"
-"指定的目录中。您无需担心在 Docker 容器停止时丢失这些文件，下次运行容器时"
-"，您可以直接使用现有的模型，无需重复下载。"
+"上述命令的原理是将主机上指定的目录挂载到容器中，并设置 ``XINFERENCE_HOME`` "
+"环境变量指向容器内的该目录。这样，所有下载的模型文件将存储在您在主机上指定的目录中。您无需担心在 Docker "
+"容器停止时丢失这些文件，下次运行容器时，您可以直接使用现有的模型，无需重复下载。"
 
-#: ../../source/getting_started/using_docker_image.rst:82
+#: ../../source/getting_started/using_docker_image.rst:85
 msgid ""
 "If you downloaded the model using the default path on the host machine, "
 "and since the xinference cache directory stores the model using symbolic "
@@ -246,9 +230,18 @@ msgid ""
 "HuggingFace and Modelscope are located at <home_path>/.cache/huggingface "
 "and <home_path>/.cache/modelscope. The command would be like:"
 msgstr ""
-"如果你在宿主机使用的默认路径下载的模型，由于 xinference cache 目录是用的"
-"软链的方式存储模型，需要将原文件所在的目录也挂载到容器内。例如你使用 "
-"huggingface 和 modelscope 作为模型仓库，那么需要将这两个对应的目录挂载到"
-"容器内，一般对应的 cache 目录分别在 <home_path>/.cache/huggingface 和 <"
-"home_path>/.cache/modelscope，使用的命令如下："
+"如果你在宿主机使用的默认路径下载的模型，由于 xinference cache "
+"目录是用的软链的方式存储模型，需要将原文件所在的目录也挂载到容器内。例如你使用 huggingface 和 modelscope "
+"作为模型仓库，那么需要将这两个对应的目录挂载到容器内，一般对应的 cache 目录分别在 "
+"<home_path>/.cache/huggingface 和 <home_path>/.cache/modelscope，使用的命令如下："
+
+#~ msgid ""
+#~ "For CUDA 12.8, add ``-cu128`` suffix,"
+#~ " e.g. ``nightly-main-cu128``. (Xinference"
+#~ " version should be between v1.8.1 and"
+#~ " v1.15.0)"
+#~ msgstr ""
+#~ "对于 CUDA 12.8 版本，增加 ``-cu128`` 后缀，如 "
+#~ "``nightly-main-cu128`` 。（Xinference 版本需要介于 "
+#~ "v1.8.1 和 v1.15.0）"
 
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/custom.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/custom.po
index aa3fa6a394..ac59faeb6c 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/models/custom.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/custom.po
@@ -7,7 +7,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2025-08-29 15:29+0800\n"
+"POT-Creation-Date: 2026-01-28 14:31+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -16,7 +16,7 @@ msgstr ""
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=utf-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"Generated-By: Babel 2.16.0\n"
+"Generated-By: Babel 2.17.0\n"
 
 #: ../../source/models/custom.rst:5
 msgid "Custom Models"
@@ -39,9 +39,8 @@ msgid ""
 "requires that the model's ``model_family`` is among the built-in "
 "supported models, and eliminates the hassle of registering the model."
 msgstr ""
-"从 ``v0.14.0`` 版本开始，如果你需要注册的模型的家族是 Xinference 内置支持"
-"的模型，你可以直接通过 launch 接口中的 ``model_path`` 参数来启动它，从而"
-"免去注册步骤的麻烦。现在非常推荐使用这种方式。"
+"从 ``v0.14.0`` 版本开始，如果你需要注册的模型的家族是 Xinference 内置支持的模型，你可以直接通过 launch 接口中的 "
+"``model_path`` 参数来启动它，从而免去注册步骤的麻烦。现在非常推荐使用这种方式。"
 
 #: ../../source/models/custom.rst:15
 msgid "For example:"
@@ -59,149 +58,196 @@ msgid ""
 "you can directly launch it using the ``worker_ip`` and ``model_path`` "
 "parameters with the launch interface."
 msgstr ""
-"对于分布式场景，将你的模型文件置于某个 worker ，然后通过 launch 接口的 ``"
-"worker_ip`` 和 ``model_path`` 参数来达到直接 launch 的效果。"
+"对于分布式场景，将你的模型文件置于某个 worker ，然后通过 launch 接口的 ``worker_ip`` 和 "
+"``model_path`` 参数来达到直接 launch 的效果。"
 
 #: ../../source/models/custom.rst:53
-msgid "Define a custom model"
+msgid ""
+"For CLI usage, prefer ``--model-path`` (kebab-case). ``--model_path`` is "
+"legacy-compatible but not recommended."
 msgstr ""
-"定义一个自定义模型"
+"对于命令行界面（CLI）的使用，请优先使用 ``--model-path``（分号分隔的大小写混合形式）。``--model_path`` "
+"兼容旧版规范，但不建议使用。"
 
-#: ../../source/models/custom.rst:55
-msgid "Define a custom model based on the following templates:"
-msgstr ""
-"基于以下模板定义一个自定义模型："
+#: ../../source/models/custom.rst:56
+msgid "Define a custom model"
+msgstr "定义一个自定义模型"
 
 #: ../../source/models/custom.rst:59
+msgid "Web UI: Automatic LLM Config Parsing"
+msgstr "Web UI：自动解析大型语言模型配置"
+
+#: ../../source/models/custom.rst:61
+msgid ""
+"When registering a custom LLM via the Web UI, Xinference can "
+"automatically parse the model configuration and pre-fill key fields for "
+"you."
+msgstr "通过Web UI注册自定义LLM时，Xinference可自动解析模型配置并为您预填关键字段。"
+
+#: ../../source/models/custom.rst:64
+msgid "You only need to provide:"
+msgstr "您仅需要提供："
+
+#: ../../source/models/custom.rst:66
+msgid "**Model path / Model ID** (where the model lives, local path or hub ID)"
+msgstr " **模型路径/模型ID** （模型所在位置，本地路径或中心ID）"
+
+#: ../../source/models/custom.rst:67
+msgid "**Model Family**"
+msgstr " **模型家族** "
+
+#: ../../source/models/custom.rst:69
+msgid "After parsing, the UI can auto-populate fields such as:"
+msgstr "解析后，用户界面可自动填充以下字段："
+
+#: ../../source/models/custom.rst:71
+msgid "``Context Length``"
+msgstr " ``上下文长度`` "
+
+#: ../../source/models/custom.rst:72
+msgid "``Model_Languages``"
+msgstr " ``模型语言`` "
+
+#: ../../source/models/custom.rst:73
+msgid "``Model_Abilities``"
+msgstr " ``模型能力`` "
+
+#: ../../source/models/custom.rst:74
+msgid "``Model_Specs``"
+msgstr " ``模型规格`` "
+
+#: ../../source/models/custom.rst:76
+msgid "You can review and edit these fields before saving the custom model."
+msgstr "在保存自定义模型之前，您可以查看并编辑这些字段。"
+
+#: ../../source/models/custom.rst:78
+msgid "Define a custom model based on the following templates:"
+msgstr "基于以下模板定义一个自定义模型："
+
+#: ../../source/models/custom.rst:82
 msgid "LLM"
 msgstr "语言模型"
 
-#: ../../source/models/custom.rst:106
+#: ../../source/models/custom.rst:129
 msgid "embedding"
 msgstr "嵌入模型"
 
-#: ../../source/models/custom.rst:141
+#: ../../source/models/custom.rst:164
 msgid "Rerank"
 msgstr "重排序模型"
 
-#: ../../source/models/custom.rst:176
+#: ../../source/models/custom.rst:199
 msgid "image"
 msgstr "图像模型"
 
-#: ../../source/models/custom.rst:211
+#: ../../source/models/custom.rst:234
 msgid "audio"
 msgstr "音频模型"
 
-#: ../../source/models/custom.rst:244
+#: ../../source/models/custom.rst:267
 msgid "flexible"
 msgstr "灵活模型"
 
-#: ../../source/models/custom.rst:271
+#: ../../source/models/custom.rst:294
 msgid ""
 "model_name: A string defining the name of the model. The name must start "
 "with a letter or a digit and can only contain letters, digits, "
 "underscores, or dashes."
-msgstr ""
-"model_name: 模型名称。名称必须以字母或数字开头，且只能包含字母、数字、"
-"下划线或短划线。"
+msgstr "model_name: 模型名称。名称必须以字母或数字开头，且只能包含字母、数字、下划线或短划线。"
 
-#: ../../source/models/custom.rst:272
+#: ../../source/models/custom.rst:295
 msgid ""
 "context_length: An optional integer that specifies the maximum context "
 "size the model was trained to accommodate, encompassing both the input "
 "and output lengths. If not defined, the default value is 2048 tokens "
 "(~1,500 words)."
-msgstr "context_length: 一个可选的整数，模型支持的最大上下文长度，包括输入和输出长度。如果未定义，默认值为2048个token（约1,500个词）。"
+msgstr ""
+"context_length: "
+"一个可选的整数，模型支持的最大上下文长度，包括输入和输出长度。如果未定义，默认值为2048个token（约1,500个词）。"
 
-#: ../../source/models/custom.rst:273
+#: ../../source/models/custom.rst:296
 msgid ""
 "dimensions: An interger defining the size of the vector output by the "
 "embedding model."
-msgstr ""
-"dimensions: 一个整数，用于定义嵌入模型输出的向量大小。"
+msgstr "dimensions: 一个整数，用于定义嵌入模型输出的向量大小。"
 
-#: ../../source/models/custom.rst:274
+#: ../../source/models/custom.rst:297
 msgid ""
 "max_tokens: An interger defining the maximum number of input tokens the "
 "embedding model can process in a single request."
-msgstr ""
-"max_tokens: 一个整数，定义嵌入模型在单次请求中可处理的最大输入token数量。"
+msgstr "max_tokens: 一个整数，定义嵌入模型在单次请求中可处理的最大输入token数量。"
 
-#: ../../source/models/custom.rst:275
+#: ../../source/models/custom.rst:298
 msgid ""
 "model_lang: A list of strings representing the supported languages for "
 "the model. Example: [\"en\"], which means that the model supports "
 "English."
-msgstr ""
-"model_lang: 一个字符串列表，表示模型支持的语言。例如：['en']，表示该模型"
-"支持英语。"
+msgstr "model_lang: 一个字符串列表，表示模型支持的语言。例如：['en']，表示该模型支持英语。"
 
-#: ../../source/models/custom.rst:276
+#: ../../source/models/custom.rst:299
 msgid ""
 "model_ability: A list of strings defining the abilities of the model. It "
 "could include options like \"embed\", \"generate\", and \"chat\". In this"
 " case, the model has the ability to \"generate\"."
 msgstr ""
-"model_ability: 一个字符串列表，定义模型的能力。它可以包括像 'embed'、'"
-"generate' 和 'chat' 这样的选项。示例表示模型具有 'generate' 的能力。"
+"model_ability: 一个字符串列表，定义模型的能力。它可以包括像 'embed'、'generate' 和 'chat' "
+"这样的选项。示例表示模型具有 'generate' 的能力。"
 
-#: ../../source/models/custom.rst:277
+#: ../../source/models/custom.rst:300
 msgid ""
 "model_family: A required string representing the family of the model you "
 "want to register. This parameter must not conflict with any builtin model"
 " names."
-msgstr ""
-"model_family: 一个必要的字符串，表示要注册的模型族。"
-"该参数名称不得与任何内置模型名称冲突。"
+msgstr "model_family: 一个必要的字符串，表示要注册的模型族。该参数名称不得与任何内置模型名称冲突。"
 
-#: ../../source/models/custom.rst:278
+#: ../../source/models/custom.rst:301
 msgid ""
 "model_specs: An array of objects defining the specifications of the "
 "model. These include:"
 msgstr "model_specs: 一个包含定义模型规格的对象数组。这些规格包括："
 
-#: ../../source/models/custom.rst:279
+#: ../../source/models/custom.rst:302
 msgid ""
 "model_format: A string that defines the model format, like \"pytorch\" or"
 " \"ggufv2\"."
 msgstr "model_format: 一个定义模型格式的字符串，可以是 'pytorch' 或 'ggufv2'。"
 
-#: ../../source/models/custom.rst:280
+#: ../../source/models/custom.rst:303
 msgid ""
 "model_size_in_billions: An integer defining the size of the model in "
 "billions of parameters."
 msgstr "model_size_in_billions: 一个整数，定义模型的参数量，以十亿为单位。"
 
-#: ../../source/models/custom.rst:281
+#: ../../source/models/custom.rst:304
 msgid ""
 "quantizations: A list of strings defining the available quantizations for"
 " the model. For PyTorch models, it could be \"4-bit\", \"8-bit\", or "
 "\"none\". For ggufv2 models, the quantizations should correspond to "
-"values that work with the ``model_file_name_template``."
+"values that work with the ``model_file_name_template``. Some engines also"
+" support ``fp4`` / ``fp8`` / ``bnb`` formats (see :ref:`installation` for"
+" backend support details)."
 msgstr ""
-"quantizations: 一个字符串列表，定义模型的量化方式。对于 PyTorch 模型，它"
-"可以是 \"4-bit\"、\"8-bit\" 或 \"none\"。对于 ggufv2 模型，量化方式应与 `"
-"`model_file_name_template`` 中的值对应。"
+"quantizations: 一个字符串列表，定义模型的量化方式。对于 PyTorch 模型，它可以是 \"4-bit\"、\"8-bit\" 或"
+" \"none\"。对于 ggufv2 模型，量化方式应与 ``model_file_name_template`` 中的值对应。"
+"某些引擎还支持 ``fp4`` / ``fp8`` / ``bnb`` 格式（后端支持详情请参见 :ref:`installation` ）。"
 
-#: ../../source/models/custom.rst:282
+#: ../../source/models/custom.rst:306
 msgid ""
 "model_id: A string representing the model ID, possibly referring to an "
 "identifier used by Hugging Face. **If model_uri is missing, Xinference "
 "will try to download the model from the huggingface repository specified "
 "here.**."
 msgstr ""
-"model_id：代表模型 id 的字符串，可以是该模型对应的 HuggingFace 仓库 id。"
-"如果 model_uri 字段缺失，Xinference 将尝试从此id指示的HuggingFace仓库下载"
-"该模型。"
+"model_id：代表模型 id 的字符串，可以是该模型对应的 HuggingFace 仓库 id。如果 model_uri "
+"字段缺失，Xinference 将尝试从此id指示的HuggingFace仓库下载该模型。"
 
-#: ../../source/models/custom.rst:283
+#: ../../source/models/custom.rst:307
 msgid ""
 "model_hub: A string representing where to download the model from, like "
 "\"Huggingface\" or \"modelscope\""
-msgstr ""
-"model_hub: 一个可选字符串，表示从何处下载模型，例如 HuggingFace 或 modelscope。"
+msgstr "model_hub: 一个可选字符串，表示从何处下载模型，例如 HuggingFace 或 modelscope。"
 
-#: ../../source/models/custom.rst:284
+#: ../../source/models/custom.rst:308
 msgid ""
 "model_uri: A string representing the URI where the model can be loaded "
 "from, such as \"file:///path/to/llama-2-7b\". **When the model format is "
@@ -210,30 +256,28 @@ msgid ""
 "model files.** If model URI is absent, Xinference will try to download "
 "the model from Hugging Face with the model ID."
 msgstr ""
-"model_uri：表示模型文件位置的字符串，例如本地目录：\"file:///path/to/"
-"llama-2-7b\"。当 model_format 是 ggufv2 ，此字段必须是具体的模型文件路径"
-"。而当 model_format 是 pytorch 时，此字段必须是一个包含所有模型文件的目录"
-"。"
+"model_uri：表示模型文件位置的字符串，例如本地目录：\"file:///path/to/llama-2-7b\"。当 "
+"model_format 是 ggufv2 ，此字段必须是具体的模型文件路径。而当 model_format 是 pytorch "
+"时，此字段必须是一个包含所有模型文件的目录。"
 
-#: ../../source/models/custom.rst:285
+#: ../../source/models/custom.rst:309
 msgid ""
 "model_revision: A string representing the specific version or commit hash"
 " of the model files to use from the repository."
-msgstr ""
-"model_revision: 一个字符串，表示从存储库中使用的模型文件的具体版本或提交哈希值。"
+msgstr "model_revision: 一个字符串，表示从存储库中使用的模型文件的具体版本或提交哈希值。"
 
-#: ../../source/models/custom.rst:286
+#: ../../source/models/custom.rst:310
 msgid ""
 "chat_template: If ``model_ability`` includes ``chat`` , you must "
 "configure this option to generate the correct full prompt during chat. "
 "This is a Jinja template string. Usually, you can find it in the "
 "``tokenizer_config.json`` file within the model directory."
 msgstr ""
-"chat_template：如果 ``model_ability`` 中包含 ``chat`` ，那么此选项必须"
-"配置以生成合适的完整提示词。这是一个 Jinja 模版字符串。通常，你可以在模型"
-"目录的 ``tokenizer_config.json`` 文件中找到。"
+"chat_template：如果 ``model_ability`` 中包含 ``chat`` "
+"，那么此选项必须配置以生成合适的完整提示词。这是一个 Jinja 模版字符串。通常，你可以在模型目录的 "
+"``tokenizer_config.json`` 文件中找到。"
 
-#: ../../source/models/custom.rst:287
+#: ../../source/models/custom.rst:311
 msgid ""
 "stop_token_ids: If ``model_ability`` includes ``chat`` , you can "
 "configure this option to control when the model stops during chat. This "
@@ -241,12 +285,11 @@ msgid ""
 "values from the ``generation_config.json`` or ``tokenizer_config.json`` "
 "file in the model directory."
 msgstr ""
-"stop_token_ids：如果 ``model_ability`` 中包含 ``chat`` ，那么推荐配置此"
-"选项以合理控制对话的停止。这是一个包含整数的列表，你可以在模型目录的 ``"
-"generation_config.json`` 和 ``tokenizer_config.json`` 文件中提取相应的值"
-"。"
+"stop_token_ids：如果 ``model_ability`` 中包含 ``chat`` "
+"，那么推荐配置此选项以合理控制对话的停止。这是一个包含整数的列表，你可以在模型目录的 ``generation_config.json`` 和 "
+"``tokenizer_config.json`` 文件中提取相应的值。"
 
-#: ../../source/models/custom.rst:288
+#: ../../source/models/custom.rst:312
 msgid ""
 "stop: If ``model_ability`` includes ``chat`` , you can configure this "
 "option to control when the model stops during chat. This is a list of "
@@ -254,101 +297,92 @@ msgid ""
 "``generation_config.json`` or ``tokenizer_config.json`` file in the model"
 " directory."
 msgstr ""
-"stop：如果 ``model_ability`` 中包含 ``chat`` ，那么推荐配置此选项以合理"
-"控制对话的停止。这是一个包含字符串的列表，你可以在模型目录的 ``tokenizer_"
-"config.json`` 文件中找到 token 值对应的字符串。"
+"stop：如果 ``model_ability`` 中包含 ``chat`` "
+"，那么推荐配置此选项以合理控制对话的停止。这是一个包含字符串的列表，你可以在模型目录的 ``tokenizer_config.json`` "
+"文件中找到 token 值对应的字符串。"
 
-#: ../../source/models/custom.rst:289
+#: ../../source/models/custom.rst:313
 msgid ""
 "reasoning_start_tag: A special token or prompt used to explicitly "
 "instruct the LLM to begin its chain-of-thought or reasoning process in "
 "its output."
-msgstr ""
-"reasoning_start_tag: 一个特殊的 token 或 prompt，用于明确指示大语言模型在其输出中思维链或推理过程的起点。"
+msgstr "reasoning_start_tag: 一个特殊的 token 或 prompt，用于明确指示大语言模型在其输出中思维链或推理过程的起点。"
 
-#: ../../source/models/custom.rst:290
+#: ../../source/models/custom.rst:314
 msgid ""
 "reasoning_end_tag: A special token or prompt used to explicitly mark the "
 "end of the model's chain-of-thought or reasoning process in its output."
-msgstr ""
-"reasoning_end_tag: 一个特殊的 token 或 prompt，用于明确指示大语言模型在其输出中思维链或推理过程的终点。"
+msgstr "reasoning_end_tag: 一个特殊的 token 或 prompt，用于明确指示大语言模型在其输出中思维链或推理过程的终点。"
 
-#: ../../source/models/custom.rst:291
+#: ../../source/models/custom.rst:315
 msgid ""
 "cache_config: A string representing the parameters and rules for how the "
 "system stores and manages temporary data (cache)."
-msgstr ""
-"cache_config: 一个字符串，表示系统存储和管理临时数据（缓存）的参数。"
+msgstr "cache_config: 一个字符串，表示系统存储和管理临时数据（缓存）的参数。"
 
-#: ../../source/models/custom.rst:292
+#: ../../source/models/custom.rst:316
 msgid ""
-"virtualenv: An array refers to the name or path of a self-contained "
-"Python environment used to isolate dependencies required to run a "
-"specific model or project. Please refer to :ref:`this document "
-"<virtualenv>`."
+"virtualenv: A settings object for model dependency isolation. Please "
+"refer to :ref:`this document <virtualenv>` for details."
 msgstr ""
-"virtualenv: 一个数组，指代用于隔离特定模型或项目运行所依赖的独立环境名称或路径。"
-"详情请阅读 :ref:`这个文档 <virtualenv>`。"
 
-
-#: ../../source/models/custom.rst:295
+#: ../../source/models/custom.rst:319
 msgid "Register a Custom Model"
 msgstr "注册一个自定义模型"
 
-#: ../../source/models/custom.rst:297
+#: ../../source/models/custom.rst:321
 msgid "Register a custom model programmatically:"
 msgstr "以代码的方式注册自定义模型"
 
-#: ../../source/models/custom.rst:312 ../../source/models/custom.rst:330
-#: ../../source/models/custom.rst:345 ../../source/models/custom.rst:400
+#: ../../source/models/custom.rst:336 ../../source/models/custom.rst:354
+#: ../../source/models/custom.rst:369 ../../source/models/custom.rst:424
 msgid "Or via CLI:"
 msgstr "以命令行的方式"
 
-#: ../../source/models/custom.rst:318
+#: ../../source/models/custom.rst:342
 msgid ""
 "Note that replace the ``<model_type>`` above with ``LLM``, ``embedding`` "
 "or ``rerank``. The same as below."
-msgstr ""
-"注意将以下部分的 ``<model_type>`` 替换为 ``LLM``、``embedding`` 或 ``"
-"rerank`` 。"
+msgstr "注意将以下部分的 ``<model_type>`` 替换为 ``LLM``、``embedding`` 或 ``rerank`` 。"
 
-#: ../../source/models/custom.rst:322
+#: ../../source/models/custom.rst:346
 msgid "List the Built-in and Custom Models"
 msgstr "列举内置和自定义模型"
 
-#: ../../source/models/custom.rst:324
+#: ../../source/models/custom.rst:348
 msgid "List built-in and custom models programmatically:"
 msgstr "以代码的方式列举内置和自定义模型"
 
-#: ../../source/models/custom.rst:337
+#: ../../source/models/custom.rst:361
 msgid "Launch the Custom Model"
 msgstr "启动自定义模型"
 
-#: ../../source/models/custom.rst:339
+#: ../../source/models/custom.rst:363
 msgid "Launch the custom model programmatically:"
 msgstr "以代码的方式启动自定义模型"
 
-#: ../../source/models/custom.rst:352
+#: ../../source/models/custom.rst:376
 msgid "Interact with the Custom Model"
 msgstr "使用自定义模型"
 
-#: ../../source/models/custom.rst:354
+#: ../../source/models/custom.rst:378
 msgid "Invoke the model programmatically:"
 msgstr "以代码的方式调用模型"
 
-#: ../../source/models/custom.rst:361
+#: ../../source/models/custom.rst:385
 msgid "Result:"
 msgstr "结果为："
 
-#: ../../source/models/custom.rst:385
+#: ../../source/models/custom.rst:409
+#, python-brace-format
 msgid "Or via CLI, replace ``${UID}`` with real model UID:"
 msgstr "或者以命令行的方式，用实际的模型 UID 替换 ``${UID}``："
 
-#: ../../source/models/custom.rst:392
+#: ../../source/models/custom.rst:416
 msgid "Unregister the Custom Model"
 msgstr "注销自定义模型"
 
-#: ../../source/models/custom.rst:394
+#: ../../source/models/custom.rst:418
 msgid "Unregister the custom model programmatically:"
 msgstr "以代码的方式注销自定义模型"
 
@@ -362,10 +396,8 @@ msgstr "以代码的方式注销自定义模型"
 #~ "file, do not fill in the specific"
 #~ " path of the model file.**"
 #~ msgstr ""
-#~ "model_file_name_template: gguf 模型"
-#~ "所需。一个 f-string 模板，用于根据"
-#~ "量化定义模型文件名。注意，这里不要填入"
-#~ "文件的路径。"
+#~ "model_file_name_template: gguf 模型所需。一个 f-string "
+#~ "模板，用于根据量化定义模型文件名。注意，这里不要填入文件的路径。"
 
 #~ msgid "Define a custom embedding model"
 #~ msgstr "定义自定义 embedding 模型"
@@ -387,19 +419,13 @@ msgstr "以代码的方式注销自定义模型"
 #~ " the supported languages for the "
 #~ "model. Example: [\"en\"], which means "
 #~ "that the model supports English."
-#~ msgstr ""
-#~ "model_lang: 一个字符串列表，表示模型"
-#~ "支持的语言。例如：['en']，表示"
-#~ "该模型支持英语。"
+#~ msgstr "model_lang: 一个字符串列表，表示模型支持的语言。例如：['en']，表示该模型支持英语。"
 
 #~ msgid ""
 #~ "model_id: A string representing the "
 #~ "model ID, possibly referring to an "
 #~ "identifier used by Hugging Face."
-#~ msgstr ""
-#~ "model_id: 一个表示模型标识的字符串，"
-#~ "类似 HuggingFace 或 ModelScope 使用的"
-#~ "标识符。"
+#~ msgstr "model_id: 一个表示模型标识的字符串，类似 HuggingFace 或 ModelScope 使用的标识符。"
 
 #~ msgid ""
 #~ "model_uri: A string representing the URI"
@@ -410,11 +436,10 @@ msgstr "以代码的方式注销自定义模型"
 #~ "from Hugging Face with the model "
 #~ "ID."
 #~ msgstr ""
-#~ "model_uri: 表示模型的 URI 的字符串"
-#~ "，例如 \"file:///path/to/llama"
-#~ "-2-7b\"。如果模型 URI 不存在，"
-#~ "Xinference 将尝试使用 model_id 从 "
-#~ "HuggingFace 或 ModelScope 下载模型。"
+#~ "model_uri: 表示模型的 URI 的字符串，例如 "
+#~ "\"file:///path/to/llama-2-7b\"。如果模型 URI 不存在，Xinference "
+#~ "将尝试使用 model_id 从 HuggingFace 或 "
+#~ "ModelScope 下载模型。"
 
 #~ msgid "Define a custom Rerank model"
 #~ msgstr "定义自定义 rerank 模型"
@@ -426,8 +451,34 @@ msgstr "以代码的方式注销自定义模型"
 #~ "type: A string defining the type "
 #~ "of the model, including ``normal``, "
 #~ "``LLM-based`` and ``LLM-based layerwise``."
+#~ msgstr "type: 表示模型的类型，可选值包括 ``normal``、``LLM-based`` 和 ``LLM-based layerwise``。"
+
+#~ msgid ""
+#~ "virtualenv: An array refers to the "
+#~ "name or path of a self-contained"
+#~ " Python environment used to isolate "
+#~ "dependencies required to run a specific"
+#~ " model or project. Please refer to"
+#~ " :ref:`this document <virtualenv>`."
+#~ msgstr ""
+#~ "virtualenv: 一个数组，指代用于隔离特定模型或项目运行所依赖的独立环境名称或路径。详情请阅读 "
+#~ ":ref:`这个文档 <virtualenv>`。"
+
+#~ msgid "**Model engine** (e.g., transformers / vllm / sglang)"
+#~ msgstr ""
+
+#~ msgid "``model_family`` / ``model_name``"
+#~ msgstr ""
+
+#~ msgid "``model_format``"
+#~ msgstr ""
+
+#~ msgid "``model_size_in_billions``"
+#~ msgstr ""
+
+#~ msgid "``quantization`` (if detectable)"
+#~ msgstr ""
+
+#~ msgid "``architectures`` and other model metadata (when available)"
 #~ msgstr ""
-#~ "type: 表示模型的类型，可选值包括 `"
-#~ "`normal``、``LLM-based`` 和 "
-#~ "``LLM-based layerwise``。"
 
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
index ab2c55cb92..02a3919401 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2025-11-18 11:54+0800\n"
+"POT-Creation-Date: 2026-01-28 15:00+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -37,10 +37,9 @@ msgid ""
 "latest version of ``transformers``. This version mismatch leads to "
 "dependency conflicts."
 msgstr ""
-"一些模型在发布后不再维护，其依赖的库版本也保持在较旧的状态。例如，``GOT-"
-"OCR2`` 模型仍依赖于 ``transformers`` 4.37.2。如果将该库升级为新版本，模型"
-"将无法正常运行；而许多新模型又需要最新版本的 ``transformers``。这种版本"
-"差异会导致依赖冲突。"
+"一些模型在发布后不再维护，其依赖的库版本也保持在较旧的状态。例如，``GOT-OCR2`` 模型仍依赖于 ``transformers`` "
+"4.37.2。如果将该库升级为新版本，模型将无法正常运行；而许多新模型又需要最新版本的 "
+"``transformers``。这种版本差异会导致依赖冲突。"
 
 #: ../../source/models/virtualenv.rst:19
 msgid "Solution"
@@ -62,8 +61,8 @@ msgid ""
 "``XINFERENCE_ENABLE_VIRTUAL_ENV=1``."
 msgstr "通过设置环境变量 ``XINFERENCE_ENABLE_VIRTUAL_ENV=1`` 启用该功能。"
 
-#: ../../source/models/virtualenv.rst:34 ../../source/models/virtualenv.rst:172
-#: ../../source/models/virtualenv.rst:188
+#: ../../source/models/virtualenv.rst:34 ../../source/models/virtualenv.rst:209
+#: ../../source/models/virtualenv.rst:225
 msgid "Example usage:"
 msgstr "使用示例："
 
@@ -77,56 +76,75 @@ msgstr "Xinference 默认会继承当前 pip 的配置。"
 
 #: ../../source/models/virtualenv.rst:52
 msgid ""
-"The model virtual environment feature is disabled by default (i.e., "
-"XINFERENCE_ENABLE_VIRTUAL_ENV is set to 0)."
+"Starting from **Xinference v2.0**, the model virtual environment feature "
+"is enabled by default (i.e., ``XINFERENCE_ENABLE_VIRTUAL_ENV`` defaults "
+"to ``1``)."
 msgstr ""
-"模型虚拟空间功能默认处于关闭状态（即 ``XINFERENCE_ENABLE_VIRTUAL_ENV`` 的"
-"默认值为 0）。"
+"从 **Xinference v2.0** 开始，模型虚拟环境功能默认启用（即 ``XINFERENCE_ENABLE_VIRTUAL_ENV``"
+" 默认值为 ``1`` ）。"
 
-#: ../../source/models/virtualenv.rst:54
-msgid "It will be enabled by default starting from Xinference v2.0.0."
-msgstr "该功能将在 Xinference v2.0.0 起默认开启。"
+#: ../../source/models/virtualenv.rst:55
+msgid ""
+"To disable it globally, set ``XINFERENCE_ENABLE_VIRTUAL_ENV=0`` when "
+"starting Xinference."
+msgstr "要全局禁用该功能，请在启动Xinference时设置 ``XINFERENCE_ENABLE_VIRTUAL_ENV=0`` 。"
 
-#: ../../source/models/virtualenv.rst:56
+#: ../../source/models/virtualenv.rst:57
 msgid ""
 "When enabled, Xinference will automatically create a dedicated virtual "
 "environment for each model when it is loaded, and install its specific "
 "dependencies there. This prevents dependency conflicts between models, "
 "allowing them to run in isolation without affecting one another."
 msgstr ""
-"启用该功能后，Xinference 会在加载模型时自动为其创建专属的虚拟环境，并在"
-"其中安装对应依赖。这可避免模型之间的依赖冲突，确保各模型在相互隔离的环境"
-"中独立运行。"
-
-#: ../../source/models/virtualenv.rst:61
-msgid "Supported Models"
-msgstr "支持的模型"
+"启用该功能后，Xinference "
+"会在加载模型时自动为其创建专属的虚拟环境，并在其中安装对应依赖。这可避免模型之间的依赖冲突，确保各模型在相互隔离的环境中独立运行。"
 
-#: ../../source/models/virtualenv.rst:63
-msgid "Currently, this feature supports the following models:"
-msgstr "当前，该功能支持以下模型："
+#: ../../source/models/virtualenv.rst:62
+msgid "Using Virtual Environments (v2.0)"
+msgstr "虚拟环境管理（v2.0）"
 
 #: ../../source/models/virtualenv.rst:65
-msgid ":ref:`GOT-OCR2 <models_builtin_got-ocr2_0>`"
-msgstr ":ref:`GOT-OCR2 <models_builtin_got-ocr2_0>`"
-
-#: ../../source/models/virtualenv.rst:66
-msgid ":ref:`Qwen2.5-omni <models_llm_qwen2.5-omni>`"
-msgstr ":ref:`Qwen2.5-omni <models_llm_qwen2.5-omni>`"
+msgid "Global toggle"
+msgstr "全局切换"
 
 #: ../../source/models/virtualenv.rst:67
-msgid "... (New models since v1.5.0 will all consider to add support)"
-msgstr "……（自 v1.5.0 起的新模型都会考虑支持该功能）"
+msgid ""
+"Virtual environments are enabled by default starting from v2.0. You can "
+"still override this globally:"
+msgstr "从v2.0版本开始，虚拟环境默认处于启用状态。您仍可通过全局设置覆盖此选项："
+
+#: ../../source/models/virtualenv.rst:78
+msgid "Per-model override at launch time"
+msgstr "启动时按模型覆盖"
+
+#: ../../source/models/virtualenv.rst:80
+msgid "You can override the global setting when launching a model:"
+msgstr "在启动模型时，您可以覆盖全局设置："
+
+#: ../../source/models/virtualenv.rst:91
+msgid "Add or override packages at launch time"
+msgstr "在启动时添加或覆盖包"
+
+#: ../../source/models/virtualenv.rst:93
+msgid "Use ``--virtual-env-package`` (or ``-vp``) multiple times:"
+msgstr "命令行中，使用 ``--virtual-env-package`` 或 ``-vp`` 来指定单个包版本。"
 
-#: ../../source/models/virtualenv.rst:70
+#: ../../source/models/virtualenv.rst:101
+msgid ""
+"If you specify a package that already exists in the model's default "
+"virtualenv package list, your version replaces the default instead of "
+"being appended."
+msgstr "若指定的软件包已在模型的默认虚拟环境软件包列表中存在，则您指定的版本将覆盖默认版本，而非追加至列表中。"
+
+#: ../../source/models/virtualenv.rst:106
 msgid "Storage Location"
 msgstr "存储位置"
 
-#: ../../source/models/virtualenv.rst:72
+#: ../../source/models/virtualenv.rst:108
 msgid "By default, the model’s virtual environment is stored under path:"
 msgstr "默认情况下，模型的虚拟环境存储在以下路径"
 
-#: ../../source/models/virtualenv.rst:74
+#: ../../source/models/virtualenv.rst:110
 #, python-brace-format
 msgid ""
 "Before v1.6.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / "
@@ -135,16 +153,16 @@ msgstr ""
 "在 v1.6.0 之前：:ref:`XINFERENCE_HOME <environments_xinference_home>` / "
 "virtualenv / {model_name}"
 
-#: ../../source/models/virtualenv.rst:75
+#: ../../source/models/virtualenv.rst:111
 #, python-brace-format
 msgid ""
 "From v1.6.0 to v1.13.0: :ref:`XINFERENCE_HOME "
 "<environments_xinference_home>` / virtualenv / v2 / {model_name}"
 msgstr ""
-"从 v1.6.0 到 v1.13.0：:ref:`XINFERENCE_HOME <environments_xinference_home>` / "
-"virtualenv / v2 / {model_name}"
+"从 v1.6.0 到 v1.13.0：:ref:`XINFERENCE_HOME <environments_xinference_home>` "
+"/ virtualenv / v2 / {model_name}"
 
-#: ../../source/models/virtualenv.rst:76
+#: ../../source/models/virtualenv.rst:112
 #, python-brace-format
 msgid ""
 "Since v1.14.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / "
@@ -153,23 +171,26 @@ msgstr ""
 "从 v1.14.0 开始：:ref:`XINFERENCE_HOME <environments_xinference_home>` / "
 "virtualenv / v3 / {model_name} / {python_version}"
 
-#: ../../source/models/virtualenv.rst:79
+#: ../../source/models/virtualenv.rst:113
+#, python-brace-format
+msgid ""
+"Since v2.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / "
+"virtualenv / v4 / {model_name} / {model_engine} / {python_version}"
+msgstr ""
+"自 v2.0 起：:ref:`XINFERENCE_HOME <environments_xinference_home>` / "
+"virtualenv / v4 / {model_name} / {model_engine} / {python_version}"
+
+#: ../../source/models/virtualenv.rst:116
 msgid "Experimental Feature"
 msgstr "实验功能"
 
-#: ../../source/models/virtualenv.rst:84
-msgid "Skip Installed Libraries"
-msgstr "跳过已安装的库"
-
-#: ../../source/models/virtualenv.rst:88
+#: ../../source/models/virtualenv.rst:125
 msgid ""
 "This feature requires ``xoscar >= 0.7.12``, which is the minimum Xoscar "
 "version required for Xinference v1.8.1."
-msgstr ""
-"此功能要求 ``xoscar >= 0.7.12``，这是 Xinference v1.8.1 需要的最低 Xoscar"
-" 版本。"
+msgstr "此功能要求 ``xoscar >= 0.7.12``，这是 Xinference v1.8.1 需要的最低 Xoscar 版本。"
 
-#: ../../source/models/virtualenv.rst:90
+#: ../../source/models/virtualenv.rst:127
 msgid ""
 "``xinference`` uses the ``uv`` tool to create virtual environments, with "
 "the current Python **system site-packages** set as the base environment. "
@@ -178,207 +199,287 @@ msgid ""
 " This ensures better isolation from system packages but can result in "
 "redundant installations, longer setup times, and increased disk usage."
 msgstr ""
-"``xinference`` 使用 ``uv`` 工具创建虚拟环境，并将当前 Python 的 **system "
-"site-packages** 设置为基础环境。默认情况下，``uv`` **不会检查系统环境中"
-"是否已有包**，而是会在虚拟环境中重新安装所有依赖。这种方式可以更好地与"
-"系统包隔离，但可能导致重复安装、初始化时间变长以及磁盘占用增加。"
+"``xinference`` 使用 ``uv`` 工具创建虚拟环境，并将当前 Python 的 **system site-packages** "
+"设置为基础环境。默认情况下，``uv`` "
+"**不会检查系统环境中是否已有包**，而是会在虚拟环境中重新安装所有依赖。这种方式可以更好地与系统包隔离，但可能导致重复安装、初始化时间变长以及磁盘占用增加。"
 
-#: ../../source/models/virtualenv.rst:94
+#: ../../source/models/virtualenv.rst:131
 msgid ""
 "Starting from ``v1.8.1``, an **experimental feature** is available: by "
 "setting the environment variable "
 "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1``, ``uv`` will **skip packages "
 "already available in system site-packages**."
 msgstr ""
-"从 ``v1.8.1`` 开始，提供了一个 **实验功能**：通过设置环境变量 ``"
-"XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1``，``uv`` 将会 **跳过系统 site-"
+"从 ``v1.8.1`` 开始，提供了一个 **实验功能**：通过设置环境变量 "
+"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1``，``uv`` 将会 **跳过系统 site-"
 "packages 中已存在的包**。"
 
-#: ../../source/models/virtualenv.rst:99
+#: ../../source/models/virtualenv.rst:136
 msgid ""
 "The feature is currently disabled but will be enabled by default in "
 "``v2.0.0``."
 msgstr "该功能当前默认关闭，但将在 ``v2.0.0`` 版本中默认启用。"
 
-#: ../../source/models/virtualenv.rst:102
+#: ../../source/models/virtualenv.rst:139
 msgid "Advantages"
 msgstr "优势"
 
-#: ../../source/models/virtualenv.rst:104
+#: ../../source/models/virtualenv.rst:141
 msgid ""
 "Avoid redundant installations of large dependencies (e.g., ``torch`` + "
 "``CUDA``)."
 msgstr "避免重复安装大型依赖（例如 ``torch`` + ``CUDA``）。"
 
-#: ../../source/models/virtualenv.rst:105
+#: ../../source/models/virtualenv.rst:142
 msgid "Speed up virtual environment creation."
 msgstr "加快虚拟环境创建速度。"
 
-#: ../../source/models/virtualenv.rst:106
+#: ../../source/models/virtualenv.rst:143
 msgid "Reduce disk usage."
 msgstr "减少磁盘空间占用。"
 
-#: ../../source/models/virtualenv.rst:109
+#: ../../source/models/virtualenv.rst:146
 msgid "Usage"
 msgstr "使用"
 
-#: ../../source/models/virtualenv.rst:121
+#: ../../source/models/virtualenv.rst:158
 msgid "Performance Comparison"
 msgstr "性能对比"
 
-#: ../../source/models/virtualenv.rst:123
+#: ../../source/models/virtualenv.rst:160
 msgid "Using the ``CosyVoice 0.5B`` model as an example:"
 msgstr "以 ``CosyVoice 0.5B`` 模型为例："
 
-#: ../../source/models/virtualenv.rst:125
+#: ../../source/models/virtualenv.rst:162
 msgid "**Without this feature enabled**::"
 msgstr "**未开启该功能时**::"
 
-#: ../../source/models/virtualenv.rst:136
+#: ../../source/models/virtualenv.rst:173
 msgid "**With this feature enabled**::"
 msgstr "**开启该功能后**::"
 
-#: ../../source/models/virtualenv.rst:151
+#: ../../source/models/virtualenv.rst:188
 msgid "Model Launching: Toggle Virtual Environments and Customize Dependencies"
 msgstr "模型加载：开关虚拟环境并自定义依赖"
 
-#: ../../source/models/virtualenv.rst:155
+#: ../../source/models/virtualenv.rst:192
 msgid ""
 "Starting from v1.8.1, we support toggling the virtual environment for "
 "individual model launching, as well as overriding the model's default "
 "settings with custom package dependencies."
-msgstr ""
-"从 v1.8.1 开始，我们支持对单个模型加载开关虚拟环境，并用自定义包依赖覆盖"
-"模型的默认设置。"
+msgstr "从 v1.8.1 开始，我们支持对单个模型加载开关虚拟环境，并用自定义包依赖覆盖模型的默认设置。"
 
-#: ../../source/models/virtualenv.rst:159
+#: ../../source/models/virtualenv.rst:196
 msgid "Toggle Virtual Environment"
 msgstr "开关模型虚拟空间"
 
-#: ../../source/models/virtualenv.rst:161
+#: ../../source/models/virtualenv.rst:198
 msgid ""
 "When loading a model, you can specify whether to enable the model's "
 "virtual environment. If not specified, the setting will follow the "
 "environment variable configuration."
-msgstr ""
-"加载模型时，可以指定是否启用模型的虚拟环境。如果未指定，则默认遵循"
-"环境变量的配置。"
+msgstr "加载模型时，可以指定是否启用模型的虚拟环境。如果未指定，则默认遵循环境变量的配置。"
 
-#: ../../source/models/virtualenv.rst:164
+#: ../../source/models/virtualenv.rst:201
 msgid ""
 "For the Web UI, this can be toggled on or off through the optional "
 "settings switch."
 msgstr "在 Web UI 中，可以通过可选设置开关打开或关闭该功能。"
 
-#: ../../source/models/virtualenv.rst:170
+#: ../../source/models/virtualenv.rst:207
 msgid ""
 "For command-line loading, use the ``--enable-virtual-env`` option to "
 "enable the virtual environment, or ``--disable-virtual-env`` to disable "
 "it."
 msgstr ""
-"命令行加载时，使用 ``--enable-virtual-env`` 选项启用虚拟环境，使用 ``--"
-"disable-virtual-env`` 选项禁用虚拟环境。"
+"命令行加载时，使用 ``--enable-virtual-env`` 选项启用虚拟环境，使用 ``--disable-virtual-env`` "
+"选项禁用虚拟环境。"
 
-#: ../../source/models/virtualenv.rst:179
+#: ../../source/models/virtualenv.rst:216
 msgid "Set Virtual Environment Package Dependencies"
 msgstr "设置虚拟环境包依赖"
 
-#: ../../source/models/virtualenv.rst:181
+#: ../../source/models/virtualenv.rst:218
 msgid ""
 "For supported models, Xinference has already defined the package "
 "dependencies and version requirements within the virtual environment. "
 "However, if you need to specify particular versions or install additional"
 " dependencies, you can manually provide them during model loading."
-msgstr ""
-"对于支持的模型，Xinference 已经在虚拟环境中定义了包依赖和版本要求。但如果"
-"需要指定特定版本或安装额外依赖，可以在加载模型时手动提供。"
+msgstr "对于支持的模型，Xinference 已经在虚拟环境中定义了包依赖和版本要求。但如果需要指定特定版本或安装额外依赖，可以在加载模型时手动提供。"
 
-#: ../../source/models/virtualenv.rst:184
+#: ../../source/models/virtualenv.rst:221
 msgid ""
 "In the Web UI, you can add custom dependencies by clicking the plus icon "
 "in the same location as the virtual environment toggle."
 msgstr "在 Web UI 中，可以在虚拟环境开关同一位置点击加号图标来添加自定义依赖。"
 
-#: ../../source/models/virtualenv.rst:186
+#: ../../source/models/virtualenv.rst:223
 msgid ""
 "For the command line, use ``--virtual-env-package`` or ``-vp`` to specify"
 " a single package version."
 msgstr "命令行中，使用 ``--virtual-env-package`` 或 ``-vp`` 来指定单个包版本。"
 
-#: ../../source/models/virtualenv.rst:194
+#: ../../source/models/virtualenv.rst:231
 msgid ""
 "In addition to the standard way of specifying package dependencies, such "
 "as ``transformers==xxx``, Xinference also supports some extended syntax."
-msgstr ""
-"除了常规的包依赖指定方式（如 ``transformers==xxx``），Xinference 还支持"
-"一些扩展语法。"
+msgstr "除了常规的包依赖指定方式（如 ``transformers==xxx``），Xinference 还支持一些扩展语法。"
 
-#: ../../source/models/virtualenv.rst:196
+#: ../../source/models/virtualenv.rst:233
 msgid ""
 "``#system_xxx#``: Using the same version as the system site packages, "
 "such as ``#system_numpy#``, ensures that the installed package matches "
 "the system site package version of numpy. This helps prevent dependency "
 "conflicts."
 msgstr ""
-"``#system_xxx#``：使用与系统 site packages 相同的版本，例如 ``#system_"
-"numpy#``，确保安装的包版本与系统 site packages 中的 numpy 版本一致，防止"
-"依赖冲突。"
+"``#system_xxx#``：使用与系统 site packages 相同的版本，例如 "
+"``#system_numpy#``，确保安装的包版本与系统 site packages 中的 numpy 版本一致，防止依赖冲突。"
+
+#: ../../source/models/virtualenv.rst:237
+msgid "Authoring Custom Models (JSON)"
+msgstr "创建自定义模型（JSON）"
 
-#: ../../source/models/virtualenv.rst:203
+#: ../../source/models/virtualenv.rst:239
+msgid ""
+"When registering a custom model, you can define a ``virtualenv`` block in"
+" the model JSON. Starting from v2.0 (v4 flow), **engine-aware markers are"
+" recommended** so one JSON can cover multiple engines."
+msgstr ""
+"注册自定义模型时，可在模型 JSON 中定义一个 ``virtualenv`` 块。从 v2.0（v4 流程）开始， **建议使用引擎感知标记** ，以便单个 JSON 文件覆盖多个引擎。"
+
+#: ../../source/models/virtualenv.rst:243
+msgid ""
+"Important rule: If a new model supports a specific engine, you **must** "
+"include at least one package entry for that engine in "
+"``virtualenv.packages`` and attach a marker, for example ``#engine# == "
+"\"vllm\"``. Engine availability checks rely on these markers when virtual"
+" environments are enabled."
+msgstr ""
+"重要规则：若新模型支持特定引擎，则 **必须** 在 ``virtualenv.packages`` 中至少包含该引擎的一个包条目，并附加标记（例如 ``#engine# == \"vllm\"`` ）。"
+"当虚拟环境启用时，引擎可用性检查依赖这些标记进行验证。"
+
+#: ../../source/models/virtualenv.rst:270
+msgid "``packages`` (required): list of pip requirement strings or markers."
+msgstr " ``packages`` （必填）：pip 要求字符串或标记的列表。"
+
+#: ../../source/models/virtualenv.rst:271
+msgid ""
+"``inherit_pip_config`` (default ``true``): inherit system pip "
+"configuration if present."
+msgstr " ``inherit_pip_config`` （默认值为 ``true`` ）：若存在系统 pip 配置文件，则继承其设置。"
+
+#: ../../source/models/virtualenv.rst:272
+msgid ""
+"``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host``: "
+"pip index and mirror controls."
+msgstr " ``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host`` : "
+"pip 索引和镜像控制。"
+
+#: ../../source/models/virtualenv.rst:274
+msgid ""
+"``index_strategy``: passed through to the virtualenv installer (used by "
+"some engines)."
+msgstr " ``index_strategy`` ：传递给虚拟环境安装程序（由某些引擎使用）。"
+
+#: ../../source/models/virtualenv.rst:275
+msgid "``no_build_isolation``: pip build isolation switch for tricky builds."
+msgstr " ``no_build_isolation`` ：用于处理复杂构建的pip构建隔离开关。"
+
+#: ../../source/models/virtualenv.rst:280
+msgid "Use wrapped placeholders to inject engine defaults:"
+msgstr "使用包裹的占位符注入引擎默认值："
+
+#: ../../source/models/virtualenv.rst:282
+msgid "``#vllm_dependencies#``"
+msgstr ""
+
+#: ../../source/models/virtualenv.rst:283
+msgid "``#sglang_dependencies#``"
+msgstr ""
+
+#: ../../source/models/virtualenv.rst:284
+msgid "``#mlx_dependencies#``"
+msgstr ""
+
+#: ../../source/models/virtualenv.rst:285
+msgid "``#transformers_dependencies#``"
+msgstr ""
+
+#: ../../source/models/virtualenv.rst:286
+msgid "``#llama_cpp_dependencies#``"
+msgstr ""
+
+#: ../../source/models/virtualenv.rst:287
+msgid "``#diffusers_dependencies#``"
+msgstr ""
+
+#: ../../source/models/virtualenv.rst:288
+msgid "``#sentence_transformers_dependencies#``"
+msgstr ""
+
+#: ../../source/models/virtualenv.rst:293
+msgid ""
+"Markers use ``#engine#`` or ``#model_engine#`` comparisons (case-"
+"sensitive). Engine values are passed in lowercase internally, so prefer "
+"lowercase values, for example ``#engine# == \"vllm\"`` or ``#engine# == "
+"\"transformers\"``."
+msgstr ""
+"标记使用 ``#engine#`` 或 ``#model_engine#`` 进行比较（区分大小写）。引擎值在内部以小写形式传递，因此建议使用小写值"
+"，例如 ``#engine# == \"vllm\"`` 或 ``#engine# == \"transformers\"`` 。"
+
+#: ../../source/models/virtualenv.rst:301
 msgid "Manage Virtual Enviroments"
 msgstr "虚拟环境管理"
 
-#: ../../source/models/virtualenv.rst:207
+#: ../../source/models/virtualenv.rst:305
 msgid ""
 "Xinference provides comprehensive virtual environment management for "
 "model dependencies, allowing you to create isolated Python environments "
 "for each model with specific package requirements."
 msgstr "Xinference 提供全面的虚拟环境管理功能，允许您为每个模型创建独立的 Python 环境，满足特定的包依赖需求。"
 
-#: ../../source/models/virtualenv.rst:219
+#: ../../source/models/virtualenv.rst:317
 msgid "Key Features"
 msgstr "核心功能"
 
-#: ../../source/models/virtualenv.rst:221
+#: ../../source/models/virtualenv.rst:319
 msgid ""
 "**Multiple Python Version Support**: Each model can have virtual "
 "environments with different Python versions (e.g., Python 3.10.18, "
 "3.11.5), enabling compatibility with various model requirements."
 msgstr ""
-"**多 Python 版本支持** : 每个模型可以拥有不同 Python 版本的虚拟环境"
-"（例如 Python 3.10.18、3.11.5），实现与各种模型要求的兼容性。"
+"**多 Python 版本支持** : 每个模型可以拥有不同 Python 版本的虚拟环境（例如 Python "
+"3.10.18、3.11.5），实现与各种模型要求的兼容性。"
 
-#: ../../source/models/virtualenv.rst:226
+#: ../../source/models/virtualenv.rst:324
 msgid ""
 "**Isolated Dependencies**: Each virtual environment contains its own set "
 "of packages, preventing conflicts between different models' requirements."
 msgstr "**依赖隔离** : 每个虚拟环境包含自己独立的包集合，防止不同模型之间的依赖冲突。"
 
-#: ../../source/models/virtualenv.rst:231
+#: ../../source/models/virtualenv.rst:329
 msgid "Management Operations"
 msgstr "管理操作"
 
-#: ../../source/models/virtualenv.rst:233
+#: ../../source/models/virtualenv.rst:331
 msgid ""
 "**Listing Virtual Environments**: View all virtual environments across "
 "your cluster, filtered by model name or worker IP address."
 msgstr "**列出虚拟环境** : 查看集群中的所有虚拟环境，支持按模型名称或工作节点 IP 地址过滤。"
 
-#: ../../source/models/virtualenv.rst:237
+#: ../../source/models/virtualenv.rst:335
 msgid ""
 "**Creating Environments**: Automatically created when launching models "
 "with enable_virtual_env=true. The system detects your current Python "
 "version and creates an isolated environment with the required packages."
 msgstr ""
-"**创建环境** : 当使用 enable_virtual_env=true 启动模型时自动创建。"
-"系统会检测当前的 Python 版本并创建包含所需包的独立环境。"
+"**创建环境** : 当使用 enable_virtual_env=true 启动模型时自动创建。系统会检测当前的 Python "
+"版本并创建包含所需包的独立环境。"
 
-#: ../../source/models/virtualenv.rst:242
+#: ../../source/models/virtualenv.rst:340
 msgid ""
 "**Removing Environments**: Delete specific virtual environments by model "
 "name and optionally Python version, or remove all environments for a "
 "model."
-msgstr ""
-"**删除环境** : 可按模型名称和可选的 Python 版本"
-"删除特定虚拟环境，或删除模型的所有环境。"
+msgstr "**删除环境** : 可按模型名称和可选的 Python 版本删除特定虚拟环境，或删除模型的所有环境。"
 
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/reference/index.po b/doc/source/locale/zh_CN/LC_MESSAGES/reference/index.po
index 8e72d0a4e2..41202fdced 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/reference/index.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/reference/index.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2025-01-26 11:51+0800\n"
+"POT-Creation-Date: 2026-01-28 11:54+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -17,7 +17,7 @@ msgstr ""
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=utf-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"Generated-By: Babel 2.14.0\n"
+"Generated-By: Babel 2.17.0\n"
 
 #: ../../source/reference/index.rst:5
 msgid "API Reference"
@@ -27,290 +27,450 @@ msgstr "API 指南"
 msgid "Client"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client <xinference.client.Client>`\\ "
 "\\(base\\_url\\[\\, api\\_key\\]\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.describe_model "
 "<xinference.client.Client.describe_model>`\\ \\(...\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid "Get model information via RESTful APIs."
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.get_model "
 "<xinference.client.Client.get_model>`\\ \\(model\\_uid\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid "Launch the model based on the parameters on the server via RESTful APIs."
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.get_model_registration "
 "<xinference.client.Client.get_model_registration>`\\ \\(...\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid "Get the model with the model type and model name registered on the server."
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.get_launch_model_progress "
+"<xinference.client.Client.get_launch_model_progress>`\\ \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid "Get progress of the specific model."
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.cancel_launch_model "
+"<xinference.client.Client.cancel_launch_model>`\\ \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid "Cancel launching model."
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.get_instance_info "
+"<xinference.client.Client.get_instance_info>`\\ \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.launch_model "
 "<xinference.client.Client.launch_model>`\\ \\(model\\_name\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.list_model_registrations "
 "<xinference.client.Client.list_model_registrations>`\\ \\(...\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid "List models registered on the server."
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.list_models "
 "<xinference.client.Client.list_models>`\\ \\(\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid "Retrieve the model specifications from the Server."
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.list_cached_models "
+"<xinference.client.Client.list_cached_models>`\\ \\(\\[...\\]\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid "Get a list of cached models."
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.list_deletable_models "
+"<xinference.client.Client.list_deletable_models>`\\ \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid "Get the cached models with the model path cached on the server."
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.confirm_and_remove_model "
+"<xinference.client.Client.confirm_and_remove_model>`\\ \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid "Remove the cached models with the model name cached on the server."
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.query_engine_by_model_name "
+"<xinference.client.Client.query_engine_by_model_name>`\\ \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid "Get the engine parameters with the model name registered on the server."
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.register_model "
 "<xinference.client.Client.register_model>`\\ \\(...\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid "Register a custom model."
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.terminate_model "
 "<xinference.client.Client.terminate_model>`\\ \\(...\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid "Terminate the specific model running on the server."
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.abort_request "
+"<xinference.client.Client.abort_request>`\\ \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid "Abort a request."
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.vllm_models "
+"<xinference.client.Client.vllm_models>`\\ \\(\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.login "
+"<xinference.client.Client.login>`\\ \\(username\\, ...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.get_workers_info "
+"<xinference.client.Client.get_workers_info>`\\ \\(\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.get_supervisor_info "
+"<xinference.client.Client.get_supervisor_info>`\\ \\(\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.get_progress "
+"<xinference.client.Client.get_progress>`\\ \\(request\\_id\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.Client.abort_cluster "
+"<xinference.client.Client.abort_cluster>`\\ \\(\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.Client.unregister_model "
 "<xinference.client.Client.unregister_model>`\\ \\(...\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:25:<autosummary>:1
+#: ../../source/reference/index.rst:39:<autosummary>:1
 msgid "Unregister a custom model."
 msgstr ""
 
-#: ../../source/reference/index.rst:27
+#: ../../source/reference/index.rst:41
 msgid "Model Handles"
 msgstr ""
 
-#: ../../source/reference/index.rst:31
+#: ../../source/reference/index.rst:45
 msgid "ChatModelHandle"
 msgstr ""
 
-#: ../../source/reference/index.rst:40:<autosummary>:1
+#: ../../source/reference/index.rst:54:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.ChatModelHandle "
 "<xinference.client.handlers.ChatModelHandle>`\\"
 msgstr ""
 
-#: ../../source/reference/index.rst:40:<autosummary>:1
+#: ../../source/reference/index.rst:54:<autosummary>:1
 msgid ""
 "alias of "
 ":py:class:`~xinference.client.restful.restful_client.RESTfulChatModelHandle`"
 msgstr ""
 
-#: ../../source/reference/index.rst:40:<autosummary>:1
+#: ../../source/reference/index.rst:54:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.ChatModelHandle.chat "
 "<xinference.client.handlers.ChatModelHandle.chat>`\\ \\(...\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:40:<autosummary>:1
+#: ../../source/reference/index.rst:54:<autosummary>:1
 msgid ""
 "Given a list of messages comprising a conversation, the model will return"
 " a response via RESTful APIs."
 msgstr ""
 
-#: ../../source/reference/index.rst:40:<autosummary>:1
+#: ../../source/reference/index.rst:54:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.ChatModelHandle.generate "
 "<xinference.client.handlers.ChatModelHandle.generate>`\\ \\(prompt\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:40:<autosummary>:1
-#: ../../source/reference/index.rst:60:<autosummary>:1
+#: ../../source/reference/index.rst:54:<autosummary>:1
+#: ../../source/reference/index.rst:84:<autosummary>:1
 msgid ""
 "Creates a completion for the provided prompt and parameters via RESTful "
 "APIs."
 msgstr ""
 
-#: ../../source/reference/index.rst:42
+#: ../../source/reference/index.rst:56
 msgid "EmbeddingModelHandle"
 msgstr ""
 
-#: ../../source/reference/index.rst:50:<autosummary>:1
+#: ../../source/reference/index.rst:64:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.EmbeddingModelHandle "
 "<xinference.client.handlers.EmbeddingModelHandle>`\\"
 msgstr ""
 
-#: ../../source/reference/index.rst:50:<autosummary>:1
+#: ../../source/reference/index.rst:64:<autosummary>:1
 msgid ""
 "alias of "
 ":py:class:`~xinference.client.restful.restful_client.RESTfulEmbeddingModelHandle`"
 msgstr ""
 
-#: ../../source/reference/index.rst:50:<autosummary>:1
+#: ../../source/reference/index.rst:64:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.EmbeddingModelHandle.create_embedding"
 " <xinference.client.handlers.EmbeddingModelHandle.create_embedding>`\\ "
 "\\(...\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:50:<autosummary>:1
+#: ../../source/reference/index.rst:64:<autosummary>:1
 msgid "Create an Embedding from user input via RESTful APIs."
 msgstr ""
 
-#: ../../source/reference/index.rst:52
+#: ../../source/reference/index.rst:66
+msgid "RerankModelHandle"
+msgstr ""
+
+#: ../../source/reference/index.rst:74:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.restful.restful_client.RESTfulRerankModelHandle"
+" <xinference.client.restful.restful_client.RESTfulRerankModelHandle>`\\ "
+"\\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:74:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.restful.restful_client.RESTfulRerankModelHandle.rerank"
+" "
+"<xinference.client.restful.restful_client.RESTfulRerankModelHandle.rerank>`\\"
+" \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:74:<autosummary>:1
+msgid ""
+"Returns an ordered list of documents ordered by their relevance to the "
+"provided query."
+msgstr ""
+
+#: ../../source/reference/index.rst:76
 msgid "GenerateModelHandle"
 msgstr ""
 
-#: ../../source/reference/index.rst:60:<autosummary>:1
+#: ../../source/reference/index.rst:84:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.GenerateModelHandle "
 "<xinference.client.handlers.GenerateModelHandle>`\\"
 msgstr ""
 
-#: ../../source/reference/index.rst:60:<autosummary>:1
+#: ../../source/reference/index.rst:84:<autosummary>:1
 msgid ""
 "alias of "
 ":py:class:`~xinference.client.restful.restful_client.RESTfulGenerateModelHandle`"
 msgstr ""
 
-#: ../../source/reference/index.rst:60:<autosummary>:1
+#: ../../source/reference/index.rst:84:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.GenerateModelHandle.generate "
 "<xinference.client.handlers.GenerateModelHandle.generate>`\\ \\(prompt\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:62
+#: ../../source/reference/index.rst:86
 msgid "ImageModelHandle"
 msgstr ""
 
-#: ../../source/reference/index.rst:70:<autosummary>:1
+#: ../../source/reference/index.rst:94:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.ImageModelHandle "
 "<xinference.client.handlers.ImageModelHandle>`\\"
 msgstr ""
 
-#: ../../source/reference/index.rst:70:<autosummary>:1
+#: ../../source/reference/index.rst:94:<autosummary>:1
 msgid ""
 "alias of "
 ":py:class:`~xinference.client.restful.restful_client.RESTfulImageModelHandle`"
 msgstr ""
 
-#: ../../source/reference/index.rst:70:<autosummary>:1
+#: ../../source/reference/index.rst:94:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.ImageModelHandle.text_to_image "
 "<xinference.client.handlers.ImageModelHandle.text_to_image>`\\ "
 "\\(prompt\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:70:<autosummary>:1
+#: ../../source/reference/index.rst:94:<autosummary>:1
 msgid "Creates an image by the input text."
 msgstr ""
 
-#: ../../source/reference/index.rst:72
+#: ../../source/reference/index.rst:96
 msgid "AudioModelHandle"
 msgstr ""
 
-#: ../../source/reference/index.rst:82:<autosummary>:1
+#: ../../source/reference/index.rst:106:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.AudioModelHandle "
 "<xinference.client.handlers.AudioModelHandle>`\\"
 msgstr ""
 
-#: ../../source/reference/index.rst:82:<autosummary>:1
+#: ../../source/reference/index.rst:106:<autosummary>:1
 msgid ""
 "alias of "
 ":py:class:`~xinference.client.restful.restful_client.RESTfulAudioModelHandle`"
 msgstr ""
 
-#: ../../source/reference/index.rst:82:<autosummary>:1
+#: ../../source/reference/index.rst:106:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.AudioModelHandle.transcriptions "
 "<xinference.client.handlers.AudioModelHandle.transcriptions>`\\ "
 "\\(audio\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:82:<autosummary>:1
+#: ../../source/reference/index.rst:106:<autosummary>:1
 msgid "Transcribes audio into the input language."
 msgstr ""
 
-#: ../../source/reference/index.rst:82:<autosummary>:1
+#: ../../source/reference/index.rst:106:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.AudioModelHandle.translations "
 "<xinference.client.handlers.AudioModelHandle.translations>`\\ \\(audio\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:82:<autosummary>:1
+#: ../../source/reference/index.rst:106:<autosummary>:1
 msgid "Translates audio into English."
 msgstr ""
 
-#: ../../source/reference/index.rst:82:<autosummary>:1
+#: ../../source/reference/index.rst:106:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.AudioModelHandle.speech "
 "<xinference.client.handlers.AudioModelHandle.speech>`\\ \\(input\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:82:<autosummary>:1
+#: ../../source/reference/index.rst:106:<autosummary>:1
 msgid "Generates audio from the input text."
 msgstr ""
 
-#: ../../source/reference/index.rst:84
+#: ../../source/reference/index.rst:108
+msgid "FlexibleModelHandle"
+msgstr ""
+
+#: ../../source/reference/index.rst:116:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.restful.restful_client.RESTfulFlexibleModelHandle"
+" <xinference.client.restful.restful_client.RESTfulFlexibleModelHandle>`\\"
+" \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:116:<autosummary>:1
+msgid ""
+":py:obj:`xinference.client.restful.restful_client.RESTfulFlexibleModelHandle.infer"
+" "
+"<xinference.client.restful.restful_client.RESTfulFlexibleModelHandle.infer>`\\"
+" \\(...\\)"
+msgstr ""
+
+#: ../../source/reference/index.rst:116:<autosummary>:1
+msgid "Call flexible model."
+msgstr ""
+
+#: ../../source/reference/index.rst:118
 msgid "VideoModelHandle"
 msgstr ""
 
-#: ../../source/reference/index.rst:90:<autosummary>:1
+#: ../../source/reference/index.rst:124:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.VideoModelHandle "
 "<xinference.client.handlers.VideoModelHandle>`\\"
 msgstr ""
 
-#: ../../source/reference/index.rst:90:<autosummary>:1
+#: ../../source/reference/index.rst:124:<autosummary>:1
 msgid ""
 "alias of "
 ":py:class:`~xinference.client.restful.restful_client.RESTfulVideoModelHandle`"
 msgstr ""
 
-#: ../../source/reference/index.rst:90:<autosummary>:1
+#: ../../source/reference/index.rst:124:<autosummary>:1
 msgid ""
 ":py:obj:`xinference.client.handlers.VideoModelHandle.text_to_video "
 "<xinference.client.handlers.VideoModelHandle.text_to_video>`\\ "
 "\\(prompt\\)"
 msgstr ""
 
-#: ../../source/reference/index.rst:90:<autosummary>:1
+#: ../../source/reference/index.rst:124:<autosummary>:1
 msgid "Creates a video by the input text."
 msgstr ""
 
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/backends.po b/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/backends.po
index a6bfa5aa40..79fe392e43 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/backends.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/backends.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2025-07-30 10:52+0800\n"
+"POT-Creation-Date: 2026-01-28 11:54+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -28,9 +28,7 @@ msgid ""
 "Xinference supports multiple backends for different models. After the "
 "user specifies the model, xinference will automatically select the "
 "appropriate backend."
-msgstr ""
-"Xinference 对于不同模型支持不同的推理引擎。用户选择模型后，Xinference 会"
-"自动选择合适的引擎"
+msgstr "Xinference 对于不同模型支持不同的推理引擎。用户选择模型后，Xinference 会自动选择合适的引擎"
 
 #: ../../source/user_guide/backends.rst:11
 msgid "llama.cpp"
@@ -44,9 +42,9 @@ msgid ""
 "tensor library `ggml`, supporting inference of the LLaMA series models "
 "and their variants."
 msgstr ""
-"Xinference 目前支持由 Xinference 团队开发的 `xllamacpp <https://github."
-"com/xorbitsai/xllamacpp>`_ 作为 llama.cpp 后端运行。`llama.cpp` 基于张量"
-"库 `ggml` 开发，支持 LLaMA 系列模型及其变体的推理。"
+"Xinference 目前支持由 Xinference 团队开发的 `xllamacpp "
+"<https://github.com/xorbitsai/xllamacpp>`_ 作为 llama.cpp 后端运行。`llama.cpp` "
+"基于张量库 `ggml` 开发，支持 LLaMA 系列模型及其变体的推理。"
 
 #: ../../source/user_guide/backends.rst:20
 msgid ""
@@ -54,9 +52,8 @@ msgid ""
 "llama.cpp, and ``llama-cpp-python`` is deprecated. Since Xinference "
 "v1.6.0, ``llama-cpp-python`` has been removed."
 msgstr ""
-"自 Xinference v1.5.0 起，``xllamacpp`` 成为 llama.cpp 的默认选项，``llama"
-"-cpp-python`` 被弃用；从 Xinference v1.6.0 开始，``llama-cpp-python`` 已"
-"被移除。"
+"自 Xinference v1.5.0 起，``xllamacpp`` 成为 llama.cpp 的默认选项，``llama-cpp-"
+"python`` 被弃用；从 Xinference v1.6.0 开始，``llama-cpp-python`` 已被移除。"
 
 #: ../../source/user_guide/backends.rst:25
 msgid ""
@@ -64,21 +61,19 @@ msgid ""
 " of the ``common_params`` structure in ``llama.cpp`` `common.h "
 "<https://github.com/ggml-org/llama.cpp/blob/master/common/common.h>`_"
 msgstr ""
-"请参考 ``llama.cpp`` 的 `common.h <https://github.com/ggml-org/llama.cpp/"
-"blob/master/common/common.h>`_ 中 ``common_params`` 结构体定义设置参数。"
+"请参考 ``llama.cpp`` 的 `common.h <https://github.com/ggml-"
+"org/llama.cpp/blob/master/common/common.h>`_ 中 ``common_params`` "
+"结构体定义设置参数。"
 
 #: ../../source/user_guide/backends.rst:27
 msgid ""
 "There may be some nested parameters. For example, ``sampling.top_k``. "
 "Just use the ``.`` to separate nested parameters."
-msgstr ""
-"可能会有嵌套多层的参数。例如，``sampling.top_k``。请使用 ``.`` 来分割嵌套"
-"参数。"
+msgstr "可能会有嵌套多层的参数。例如，``sampling.top_k``。请使用 ``.`` 来分割嵌套参数。"
 
 #: ../../source/user_guide/backends.rst:29
 msgid "Here is an example of setting nested sampling parameters in WebUI:"
-msgstr ""
-"这里有一个在 WebUI 中设置嵌套 sampling 参数的例子："
+msgstr "这里有一个在 WebUI 中设置嵌套 sampling 参数的例子："
 
 #: ../../source/user_guide/backends.rst:36
 msgid "Auto NGL"
@@ -88,9 +83,7 @@ msgstr "自动 NGL"
 msgid ""
 "Auto GPU layers estimation is enabled since v1.6.1 when ``n-gpu-layers`` "
 "is not specified (default is -1)."
-msgstr ""
-"自 v1.6.1 起，当未指定 n-gpu-layers（默认为 -1）时，将自动启用 GPU 层数"
-"估算功能。"
+msgstr "自 v1.6.1 起，当未指定 n-gpu-layers（默认为 -1）时，将自动启用 GPU 层数估算功能。"
 
 #: ../../source/user_guide/backends.rst:41
 msgid ""
@@ -100,9 +93,8 @@ msgid ""
 "optimized, and there is still a chance of encountering an out-of-memory "
 "error."
 msgstr ""
-"这个特性可以为 llama.cpp 后端自动设置 GPU 层数（NGL）。请注意这并不是一个"
-"精确的计算，因此 ``-ngl`` 结果可能不是最优的，并且仍然可能遇到显存不足的"
-"错误。"
+"这个特性可以为 llama.cpp 后端自动设置 GPU 层数（NGL）。请注意这并不是一个精确的计算，因此 ``-ngl`` "
+"结果可能不是最优的，并且仍然可能遇到显存不足的错误。"
 
 #: ../../source/user_guide/backends.rst:45
 msgid ""
@@ -128,9 +120,7 @@ msgstr "我们的实现是基于 Ollama 的自动 NGL，但是有一些不同之
 msgid ""
 "We utilize device information detected by `xllamacpp "
 "<https://github.com/xorbitsai/xllamacpp>`_."
-msgstr ""
-"我们使用 `xllamacpp <https://github.com/xorbitsai/xllamacpp>`_ 提供的设备"
-"信息。"
+msgstr "我们使用 `xllamacpp <https://github.com/xorbitsai/xllamacpp>`_ 提供的设备信息。"
 
 #: ../../source/user_guide/backends.rst:53
 msgid ""
@@ -148,9 +138,7 @@ msgstr "如果自动 NGL 失败，我们会尝试全部加载到 GPU。"
 msgid ""
 "We do not support multimodal projectors embedded into the model GGUF, as "
 "this is a very experimental feature."
-msgstr ""
-"我们不支持多模态投影器内嵌到模型的 GGUF，这种格式的模型目前还处于实验阶段"
-"。"
+msgstr "我们不支持多模态投影器内嵌到模型的 GGUF，这种格式的模型目前还处于实验阶段。"
 
 #: ../../source/user_guide/backends.rst:59
 msgid "Common Issues"
@@ -188,8 +176,8 @@ msgid ""
 " default. Please increase the context size by either increasing ``n_ctx``"
 " or reducing ``n_parallel``."
 msgstr ""
-"如果你正在使用 multimodal 功能，``ctx_shift`` 会被默认关闭。请尝试增加 ``"
-"n_ctx`` 或者减小 ``n_parallel`` 以增加每个 slot 的 context 大小。"
+"如果你正在使用 multimodal 功能，``ctx_shift`` 会被默认关闭。请尝试增加 ``n_ctx`` 或者减小 "
+"``n_parallel`` 以增加每个 slot 的 context 大小。"
 
 #: ../../source/user_guide/backends.rst:85
 #, python-brace-format
@@ -208,9 +196,9 @@ msgid ""
 "serially, increasing ``n_parallel`` can't improve the latency or "
 "throughput."
 msgstr ""
-"可能由于 KV cache 创建失败导致。你可以通过减小 ``n_ctx`` 或者增加 ``n_"
-"parallel`` 或者调节 ``n_gpu_layers`` 参数加载部分模型到 GPU 来解决。请"
-"注意，如果你只处理串行推理请求，增加 ``n_parallel`` 并不会带来性能提升。"
+"可能由于 KV cache 创建失败导致。你可以通过减小 ``n_ctx`` 或者增加 ``n_parallel`` 或者调节 "
+"``n_gpu_layers`` 参数加载部分模型到 GPU 来解决。请注意，如果你只处理串行推理请求，增加 ``n_parallel`` "
+"并不会带来性能提升。"
 
 #: ../../source/user_guide/backends.rst:102
 msgid "transformers"
@@ -257,8 +245,10 @@ msgid ""
 msgstr "当满足以下条件时，Xinference 会自动选择 vLLM 作为推理引擎："
 
 #: ../../source/user_guide/backends.rst:120
-msgid "The model format is ``pytorch``, ``gptq`` or ``awq``."
-msgstr "模型格式为 ``pytorch`` ， ``gptq`` 或者 ``awq`` 。"
+msgid ""
+"The model format is ``pytorch``, ``gptq``, ``awq``, ``fp4``, ``fp8`` or "
+"``bnb``."
+msgstr "模型格式为 ``pytorch`` ， ``gptq`` ， ``awq`` ， ``fp4`` ， ``fp8`` 或者 ``bnb`` 。"
 
 #: ../../source/user_guide/backends.rst:121
 msgid "When the model format is ``pytorch``, the quantization is ``none``."
@@ -282,9 +272,7 @@ msgstr "操作系统为 Linux 并且至少有一个支持 CUDA 的设备"
 msgid ""
 "The model family (for custom models) / model name (for builtin models) is"
 " within the list of models supported by vLLM"
-msgstr ""
-"自定义模型的 ``model_family`` 字段和内置模型的 ``model_name`` 字段在 vLLM"
-" 的支持列表中。"
+msgstr "自定义模型的 ``model_family`` 字段和内置模型的 ``model_name`` 字段在 vLLM 的支持列表中。"
 
 #: ../../source/user_guide/backends.rst:127
 msgid "Currently, supported model includes:"
@@ -292,174 +280,160 @@ msgstr "目前，支持的模型包括："
 
 #: ../../source/user_guide/backends.rst:131
 msgid ""
-"``llama-2``, ``llama-3``, ``llama-3.1``, ``llama-3.2-vision``, "
-"``llama-2-chat``, ``llama-3-instruct``, ``llama-3.1-instruct``, "
-"``llama-3.3-instruct``"
+"``code-llama``, ``code-llama-instruct``, ``code-llama-python``, "
+"``deepseek``, ``deepseek-chat``, ``deepseek-coder``, ``deepseek-coder-"
+"instruct``, ``deepseek-r1-distill-llama``, ``gorilla-openfunctions-v2``, "
+"``HuatuoGPT-o1-LLaMA-3.1``, ``llama-2``, ``llama-2-chat``, ``llama-3``, "
+"``llama-3-instruct``, ``llama-3.1``, ``llama-3.1-instruct``, "
+"``llama-3.3-instruct``, ``tiny-llama``, ``wizardcoder-python-v1.0``, "
+"``wizardmath-v1.0``, ``Yi``, ``Yi-1.5``, ``Yi-1.5-chat``, ``Yi-1.5-chat-"
+"16k``, ``Yi-200k``, ``Yi-chat``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:132
 msgid ""
-"``mistral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``, "
-"``mistral-instruct-v0.3``, ``mistral-nemo-instruct``, ``mistral-large-"
-"instruct``"
+"``codestral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``,"
+" ``mistral-instruct-v0.3``, ``mistral-large-instruct``, ``mistral-nemo-"
+"instruct``, ``mistral-v0.1``, ``openhermes-2.5``, ``seallm_v2``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:133
-msgid "``codestral-v0.1``"
+msgid ""
+"``Baichuan-M2``, ``codeqwen1.5``, ``codeqwen1.5-chat``, ``deepseek-r1"
+"-distill-qwen``, ``DianJin-R1``, ``fin-r1``, ``HuatuoGPT-o1-Qwen2.5``, "
+"``KAT-V1``, ``marco-o1``, ``qwen1.5-chat``, ``qwen2-instruct``, "
+"``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-coder-instruct``, "
+"``qwen2.5-instruct``, ``qwen2.5-instruct-1m``, ``qwenLong-l1``, ``QwQ-"
+"32B``, ``QwQ-32B-Preview``, ``seallms-v3``, ``skywork-or1``, ``skywork-"
+"or1-preview``, ``XiYanSQL-QwenCoder-2504``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:134
-msgid "``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``"
+msgid "``llama-3.2-vision``, ``llama-3.2-vision-instruct``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:135
-msgid "``code-llama``, ``code-llama-python``, ``code-llama-instruct``"
+msgid "``baichuan-2``, ``baichuan-2-chat``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:136
-msgid ""
-"``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-"
-"instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, "
-"``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, "
-"``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-"
-"prover-v2``, ``deepseek-r1-0528-qwen3``, ``deepseek-r1-distill-llama``"
+msgid "``InternLM2ForCausalLM``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:137
-msgid "``yi-coder``, ``yi-coder-chat``"
+msgid "``qwen-chat``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:138
-msgid "``codeqwen1.5``, ``codeqwen1.5-chat``"
+msgid ""
+"``mixtral-8x22B-instruct-v0.1``, ``mixtral-instruct-v0.1``, "
+"``mixtral-v0.1``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:139
-msgid ""
-"``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-"
-"instruct``, ``qwen2.5-instruct-1m``"
+msgid "``cogagent``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:140
-msgid "``baichuan-2-chat``"
+msgid "``glm-edge-chat``, ``glm4-chat``, ``glm4-chat-1m``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:141
-msgid "``internlm2-chat``"
+msgid "``codegeex4``, ``glm-4v``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:142
-msgid "``internlm2.5-chat``, ``internlm2.5-chat-1m``"
+msgid "``seallm_v2.5``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:143
-msgid "``qwen-chat``"
+msgid "``orion-chat``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:144
-msgid "``mixtral-instruct-v0.1``, ``mixtral-8x22B-instruct-v0.1``"
+msgid "``qwen1.5-moe-chat``, ``qwen2-moe-instruct``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:145
-msgid "``chatglm3``, ``chatglm3-32k``, ``chatglm3-128k``"
+msgid "``CohereForCausalLM``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:146
-msgid "``glm4-chat``, ``glm4-chat-1m``, ``glm4-0414``"
+msgid ""
+"``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, "
+"``deepseek-vl2``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:147
-msgid "``codegeex4``"
+msgid ""
+"``deepseek-prover-v2``, ``deepseek-r1``, ``deepseek-r1-0528``, "
+"``deepseek-v3``, ``deepseek-v3-0324``, ``Deepseek-V3.1``, ``moonlight-"
+"16b-a3b-instruct``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:148
-msgid "``qwen1.5-chat``, ``qwen1.5-moe-chat``"
+msgid "``deepseek-r1-0528-qwen3``, ``qwen3``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:149
-msgid "``qwen2-instruct``, ``qwen2-moe-instruct``"
+msgid "``minicpm3-4b``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:150
-msgid "``XiYanSQL-QwenCoder-2504``"
+msgid "``internlm3-instruct``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:151
-msgid "``QwQ-32B-Preview``, ``QwQ-32B``"
+msgid "``gemma-3-1b-it``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:152
-msgid "``marco-o1``"
+msgid "``glm4-0414``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:153
-msgid "``fin-r1``"
+msgid ""
+"``minicpm-2b-dpo-bf16``, ``minicpm-2b-dpo-fp16``, ``minicpm-2b-dpo-"
+"fp32``, ``minicpm-2b-sft-bf16``, ``minicpm-2b-sft-fp32``, ``minicpm4``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:154
-msgid "``seallms-v3``"
+msgid "``Ernie4.5``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:155
-msgid "``skywork-or1-preview``, ``skywork-or1``"
+msgid "``Qwen3-Coder``, ``Qwen3-Instruct``, ``Qwen3-Thinking``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:156
-msgid "``HuatuoGPT-o1-Qwen2.5``, ``HuatuoGPT-o1-LLaMA-3.1``"
+msgid "``glm-4.5``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:157
-msgid "``DianJin-R1``"
+msgid "``gpt-oss``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:158
-msgid "``gemma-it``, ``gemma-2-it``, ``gemma-3-1b-it``"
+msgid "``seed-oss``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:159
-msgid "``orion-chat``, ``orion-chat-rag``"
+msgid "``Qwen3-Next-Instruct``, ``Qwen3-Next-Thinking``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:160
-msgid "``c4ai-command-r-v01``"
+msgid "``DeepSeek-V3.2``, ``DeepSeek-V3.2-Exp``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:161
-msgid "``minicpm3-4b``"
-msgstr ""
-
-#: ../../source/user_guide/backends.rst:162
-msgid "``internlm3-instruct``"
-msgstr ""
-
-#: ../../source/user_guide/backends.rst:163
-msgid "``moonlight-16b-a3b-instruct``"
-msgstr ""
-
-#: ../../source/user_guide/backends.rst:164
-msgid "``qwenLong-l1``"
-msgstr ""
-
-#: ../../source/user_guide/backends.rst:165
-msgid "``qwen3``"
-msgstr ""
-
-#: ../../source/user_guide/backends.rst:166
-msgid "``minicpm4``"
+msgid "``MiniMax-M2``"
 msgstr ""
 
 #: ../../source/user_guide/backends.rst:167
-msgid "``Ernie4.5``"
-msgstr ""
-
-#: ../../source/user_guide/backends.rst:168
-msgid "``Qwen3-Instruct``"
-msgstr ""
-
-#: ../../source/user_guide/backends.rst:174
 msgid "SGLang"
 msgstr ""
 
-#: ../../source/user_guide/backends.rst:175
+#: ../../source/user_guide/backends.rst:168
 msgid ""
 "`SGLang <https://github.com/sgl-project/sglang>`_ has a high-performance "
 "inference runtime with RadixAttention. It significantly accelerates the "
@@ -467,21 +441,21 @@ msgid ""
 "multiple calls. And it also supports other common techniques like "
 "continuous batching and tensor parallelism."
 msgstr ""
-"`SGLang <https://github.com/sgl-project/sglang>`_ 具有基于 RadixAttention"
-" 的高性能推理运行时。它通过在多个调用之间自动重用KV缓存，显著加速了复杂 "
-"LLM 程序的执行。它还支持其他常见推理技术，如连续批处理和张量并行处理。"
+"`SGLang <https://github.com/sgl-project/sglang>`_ 具有基于 RadixAttention "
+"的高性能推理运行时。它通过在多个调用之间自动重用KV缓存，显著加速了复杂 LLM "
+"程序的执行。它还支持其他常见推理技术，如连续批处理和张量并行处理。"
 
-#: ../../source/user_guide/backends.rst:182
+#: ../../source/user_guide/backends.rst:175
 msgid "MLX"
 msgstr ""
 
-#: ../../source/user_guide/backends.rst:183
+#: ../../source/user_guide/backends.rst:176
 msgid ""
 "`MLX <https://github.com/ml-explore/mlx-examples/tree/main/llms>`_ "
 "provides efficient runtime to run LLM on Apple silicon. It's recommended "
 "to use for Mac users when running on Apple silicon if the model has MLX "
 "format support."
 msgstr ""
-"`MLX <https://github.com/ml-explore/mlx-examples/tree/main/llms>`_ 提供在"
-"苹果 silicon 芯片上高效运行 LLM 的方式。在模型包含 MLX 格式的时候，推荐"
-"使用苹果 silicon 芯片的 Mac 用户使用 MLX 引擎。"
+"`MLX <https://github.com/ml-explore/mlx-examples/tree/main/llms>`_ 提供在苹果 "
+"silicon 芯片上高效运行 LLM 的方式。在模型包含 MLX 格式的时候，推荐使用苹果 silicon 芯片的 Mac 用户使用 MLX "
+"引擎。"
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po b/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po
index 6c7bd5fed5..4c89e64b9c 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2025-06-26 23:29+0800\n"
+"POT-Creation-Date: 2026-01-28 11:54+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -29,30 +29,26 @@ msgid ""
 "too large to fit into GPus on a single machine, Xinference supported "
 "running these models across multiple machines."
 msgstr ""
-"一些语言模型，包括 **DeepSeek V3**、**DeepSeek R1** 等，体积过大，无法"
-"适配单台机器上的 GPU，Xinference 支持在多台机器上运行这些模型。"
+"一些语言模型，包括 **DeepSeek V3**、**DeepSeek R1** 等，体积过大，无法适配单台机器上的 "
+"GPU，Xinference 支持在多台机器上运行这些模型。"
 
-#: ../../source/user_guide/distributed_inference.rst:10
-msgid "This feature is added in v1.3.0."
-msgstr "这些特性在 v1.3.0 中添加。"
-
-#: ../../source/user_guide/distributed_inference.rst:14
+#: ../../source/user_guide/distributed_inference.rst:13
 msgid "Supported Engines"
 msgstr "支持的引擎"
 
-#: ../../source/user_guide/distributed_inference.rst:15
+#: ../../source/user_guide/distributed_inference.rst:14
 msgid "Now, Xinference supported below engines to run models across workers."
 msgstr "现在，Xinference 支持如下引擎在多台 worker 上运行模型。"
 
-#: ../../source/user_guide/distributed_inference.rst:17
+#: ../../source/user_guide/distributed_inference.rst:16
 msgid ":ref:`SGLang <sglang_backend>` (supported in v1.3.0)"
 msgstr ":ref:`SGLang <sglang_backend>` （在 v1.3.0 中支持）"
 
-#: ../../source/user_guide/distributed_inference.rst:18
+#: ../../source/user_guide/distributed_inference.rst:17
 msgid ":ref:`vLLM <vllm_backend>` (supported in v1.4.1)"
 msgstr ":ref:`vLLM <vllm_backend>` （在 v1.4.1 中支持）"
 
-#: ../../source/user_guide/distributed_inference.rst:19
+#: ../../source/user_guide/distributed_inference.rst:18
 msgid ""
 ":ref:`MLX <mlx_backend>` (supported in v1.7.1), MLX distributed currently"
 " does not support all models. The following model types are supported at "
@@ -60,57 +56,64 @@ msgid ""
 "GitHub issue at `https://github.com/xorbitsai/inference/issues "
 "<https://github.com/xorbitsai/inference/issues>`_ to request support."
 msgstr ""
-":ref:`MLX <mlx_backend>` （自 v1.7.1 起支持）目前在分布式模式下并不支持"
-"所有模型。目前支持以下几种模型类型。如果你有其他需求，欢迎在 `https://"
-"github.com/xorbitsai/inference/issues <https://github.com/xorbitsai/"
-"inference/issues>`_ 提交 GitHub issue 来请求支持。"
+":ref:`MLX <mlx_backend>` （自 v1.7.1 "
+"起支持）目前在分布式模式下并不支持所有模型。目前支持以下几种模型类型。如果你有其他需求，欢迎在 "
+"`https://github.com/xorbitsai/inference/issues "
+"<https://github.com/xorbitsai/inference/issues>`_ 提交 GitHub issue 来请求支持。"
 
-#: ../../source/user_guide/distributed_inference.rst:23
+#: ../../source/user_guide/distributed_inference.rst:22
 msgid "DeepSeek v3 and R1"
 msgstr "DeepSeek v3 和 R1"
 
-#: ../../source/user_guide/distributed_inference.rst:24
+#: ../../source/user_guide/distributed_inference.rst:23
 msgid "Qwen2.5-instruct and the models have the same model architectures."
 msgstr "Qwen2.5-instruct 及其他具有相同模型架构的模型。"
 
-#: ../../source/user_guide/distributed_inference.rst:25
+#: ../../source/user_guide/distributed_inference.rst:24
 msgid "Qwen3 and the models have the same model architectures."
 msgstr "Qwen3 及其他具有相同模型架构的模型。"
 
-#: ../../source/user_guide/distributed_inference.rst:26
+#: ../../source/user_guide/distributed_inference.rst:25
 msgid "Qwen3-moe and the models have the same model architectures."
 msgstr "Qwen3-moe 及其他具有相同模型架构的模型。"
 
-#: ../../source/user_guide/distributed_inference.rst:31
+#: ../../source/user_guide/distributed_inference.rst:30
 msgid "Usage"
 msgstr "使用"
 
-#: ../../source/user_guide/distributed_inference.rst:32
+#: ../../source/user_guide/distributed_inference.rst:31
 msgid ""
 "First you need at least 2 workers to support distributed inference. Refer"
 " to :ref:`running Xinference in cluster <distributed_getting_started>` to"
 " create a Xinference cluster including supervisor and workers."
 msgstr ""
-"首先，您需要至少 2 个工作节点来支持分布式推理。请参考 :ref:`在集群中运行 "
-"Xinference <distributed_getting_started>` 以创建包含 supervisor 节点和 "
-"worker 节点的 Xinference 集群。"
+"首先，您需要至少 2 个工作节点来支持分布式推理。请参考 :ref:`在集群中运行 Xinference "
+"<distributed_getting_started>` 以创建包含 supervisor 节点和 worker 节点的 Xinference"
+" 集群。"
 
-#: ../../source/user_guide/distributed_inference.rst:36
+#: ../../source/user_guide/distributed_inference.rst:35
+msgid ""
+"vLLM (v0.11.0+) note: Starting from vLLM v0.11.0, distributed deployment "
+"with vLLM requires Xinference >= v1.17.1. In addition to setting "
+"``--n-worker`` as before, you must also set ``tensor_parallel_size=2`` "
+"and ``pipeline_parallel_size=1`` when launching the model."
+msgstr ""
+"vLLM（v0.11.0+）注意事项：从vLLM v0.11.0版本开始，使用vLLM进行分布式部署需要Xinference >= v1.17.1版本。"
+"除原有的 ``--n-worker`` 参数设置外，启动模型时还必须同时设置 ``tensor_parallel_size=2`` 和 ``pipeline_parallel_size=1`` 参数。"
+
+#: ../../source/user_guide/distributed_inference.rst:40
 msgid ""
 "Then if are using web UI, choose expected machines for ``worker count`` "
 "in the optional configurations, if you are using command line, add "
 "``--n-worker <machine number>`` when launching a model. The model will be"
 " launched across multiple workers accordingly."
 msgstr ""
-"然后，如果您使用的是 Web UI，请在可选配置中选择期望的机器数量作为 ``"
-"worker count``；如果您使用的是命令行，启动模型时请添加 ``--n-worker <机器"
-"数量>``。模型将相应地在多个工作节点上启动。"
+"然后，如果您使用的是 Web UI，请在可选配置中选择期望的机器数量作为 ``worker count``；如果您使用的是命令行，启动模型时请添加"
+" ``--n-worker <机器数量>``。模型将相应地在多个工作节点上启动。"
 
-#: ../../source/user_guide/distributed_inference.rst:44
+#: ../../source/user_guide/distributed_inference.rst:48
 msgid ""
 "``GPU count`` on web UI, or ``--n-gpu`` for command line now mean GPUs "
 "count per worker if you are using distributed inference."
-msgstr ""
-"使用分布式推理时，在 Web UI 中的 ``GPU count`` 或命令行中的 ``--n-gpu`` "
-"现在表示每个工作节点的 GPU 数量。"
+msgstr "使用分布式推理时，在 Web UI 中的 ``GPU count`` 或命令行中的 ``--n-gpu`` 现在表示每个工作节点的 GPU 数量。"
 
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/launch.po b/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/launch.po
index 3625f02c4e..8eb4fe29f4 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/launch.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/launch.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2025-12-13 13:41+0800\n"
+"POT-Creation-Date: 2026-01-28 11:54+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -41,10 +41,9 @@ msgid ""
 "Meanwhile, users see it as a single model, which greatly improves overall"
 " resource utilization."
 msgstr ""
-"副本用来指定模型加载的实例份数。比如，你有两张 GPU，每张卡可以放下模型的"
-"一个副本，你可以设置副本数为 2。这样，两个完全相同的模型实例将分布在这"
-"两张 GPU 上。Xinference 会自动进行负载均衡，确保请求均匀分配到多张卡上。"
-"用户看到的仍是一个模型，这大大提升了整体资源利用率。"
+"副本用来指定模型加载的实例份数。比如，你有两张 GPU，每张卡可以放下模型的一个副本，你可以设置副本数为 "
+"2。这样，两个完全相同的模型实例将分布在这两张 GPU 上。Xinference "
+"会自动进行负载均衡，确保请求均匀分配到多张卡上。用户看到的仍是一个模型，这大大提升了整体资源利用率。"
 
 #: ../../source/user_guide/launch.rst:17
 msgid "Traditional Multi-Instance Deployment："
@@ -55,9 +54,7 @@ msgid ""
 "When you have multiple GPU cards, each capable of hosting one model "
 "instance, you can set the number of instances equal to the number of "
 "GPUs. For example:"
-msgstr ""
-"当您拥有多张GPU显卡时，每张显卡可承载一个模型实例，此时可将实例数量设置为"
-"等于GPU数量。例如:"
+msgstr "当您拥有多张GPU显卡时，每张显卡可承载一个模型实例，此时可将实例数量设置为等于GPU数量。例如:"
 
 #: ../../source/user_guide/launch.rst:21
 msgid "2 GPUs, 2 instances: Each GPU runs one model instance"
@@ -131,10 +128,10 @@ msgstr "混合分配策略"
 msgid ""
 "The current policy is *Idle First*: The scheduler always attempts to "
 "assign replicas to the least utilized GPU. Use the "
-"``XINFERENCE_ENV_LAUNCH_STRATEGY`` parameter to choose launch strategy."
+"``XINFERENCE_LAUNCH_STRATEGY`` parameter to choose launch strategy."
 msgstr ""
-"当前策略为 *空闲优先* ：调度器始终尝试将副本分配至最空闲的GPU。"
-"使用 ``XINFERENCE_ENV_LAUNCH_STRATEGY`` 参数选择启动策略。"
+"当前策略为 *空闲优先* ：调度器始终尝试将副本分配至最空闲的GPU。使用 ``XINFERENCE_ENV_LAUNCH_STRATEGY`` "
+"参数选择启动策略。"
 
 #: ../../source/user_guide/launch.rst:59
 msgid "Set Environment Variables"
@@ -147,8 +144,8 @@ msgid ""
 "configure these individually without needing to set them before starting "
 "Xinference."
 msgstr ""
-"有时我们希望在运行时为特定模型指定环境变量。从 v1.8.1 开始，Xinference "
-"提供了单独配置环境变量的功能，无需在启动 Xinference 前设置。"
+"有时我们希望在运行时为特定模型指定环境变量。从 v1.8.1 开始，Xinference 提供了单独配置环境变量的功能，无需在启动 "
+"Xinference 前设置。"
 
 #: ../../source/user_guide/launch.rst:66
 msgid "For Web UI."
@@ -160,7 +157,7 @@ msgid ""
 "variable."
 msgstr "命令行使用时，使用 ``--env`` 指定环境变量。"
 
-#: ../../source/user_guide/launch.rst:74
+#: ../../source/user_guide/launch.rst:74 ../../source/user_guide/launch.rst:123
 msgid "Example usage:"
 msgstr "示例用法："
 
@@ -171,9 +168,8 @@ msgid ""
 "use of V0 by setting ``VLLM_USE_V1=0`` when launching a model, you can "
 "specify this during model launching."
 msgstr ""
-"以 vLLM 为例，它有 V1 和 V0 两个版本，默认会自动判定使用哪个版本。如果想"
-"在加载模型时强制通过设置 ``VLLM_USE_V1=0`` 来使用 V0，可以指定该环境变量"
-"。"
+"以 vLLM 为例，它有 V1 和 V0 两个版本，默认会自动判定使用哪个版本。如果想在加载模型时强制通过设置 ``VLLM_USE_V1=0``"
+" 来使用 V0，可以指定该环境变量。"
 
 #: ../../source/user_guide/launch.rst:84
 msgid "Configuring Model Virtual Environment"
@@ -183,7 +179,61 @@ msgstr "配置模型虚拟空间"
 msgid ""
 "For this part, please refer to :ref:`toggling virtual environments and "
 "customizing dependencies <model_launching_virtualenv>`."
+msgstr "对于这部分，请参考 :ref:`开关虚拟空间和定制依赖 <model_launching_virtualenv>`。"
+
+#: ../../source/user_guide/launch.rst:91
+msgid "Batching / Continuous Batching"
+msgstr "批处理 / 连续批处理"
+
+#: ../../source/user_guide/launch.rst:93
+msgid ""
+"Xinference supports batching for higher throughput. For LLMs on the "
+"``transformers`` engine, continuous batching is available and can be "
+"enabled via environment variables at launch time."
 msgstr ""
-"对于这部分，请参考 :ref:`开关虚拟空间和定制依赖 <model_launching_"
-"virtualenv>`。"
+"Xinference支持批处理以提升吞吐量。对于基于 ``transformers`` "
+"引擎的大型语言模型，可启用连续批处理功能，该功能可在启动时通过环境变量进行配置。"
+
+#: ../../source/user_guide/launch.rst:96
+msgid "Key settings:"
+msgstr "关键设置："
+
+#: ../../source/user_guide/launch.rst:98
+msgid ""
+"``XINFERENCE_BATCH_SIZE`` and ``XINFERENCE_BATCH_INTERVAL`` for general "
+"batching behavior."
+msgstr " ``XINFERENCE_BATCH_SIZE`` 和 ``XINFERENCE_BATCH_INTERVAL`` 用于控制常规的批处理行为。"
+
+#: ../../source/user_guide/launch.rst:99
+msgid ""
+"``XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE`` for text-to-image models (when"
+" supported)."
+msgstr " ``XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE``（文本转图像模型，当支持时）。"
+
+#: ../../source/user_guide/launch.rst:101
+msgid "Example (LLM, transformers):"
+msgstr "示例（大型语言模型，Transformers）："
+
+#: ../../source/user_guide/launch.rst:108
+msgid "Example (text-to-image):"
+msgstr "示例（文生图）："
+
+#: ../../source/user_guide/launch.rst:114
+msgid ""
+"For detailed behavior, supported models, and aborting requests, see "
+":ref:`Continuous Batching <user_guide_continuous_batching>`."
+msgstr ""
+"有关详细行为、支持的模型以及中止请求的信息，请参阅"
+" :ref:`连续批处理 <user_guide_continuous_batching>` 。"
+
+#: ../../source/user_guide/launch.rst:118
+msgid "Thinking Mode"
+msgstr "思考模式"
+
+#: ../../source/user_guide/launch.rst:120
+msgid ""
+"Some hybrid reasoning models (for example, Qwen3) support an optional "
+"*thinking mode*. You can enable this at launch time via ``--enable-"
+"thinking``."
+msgstr "某些混合推理模型（例如Qwen3）支持可选的 *思考模式* 。您可在启动时通过 ``--enable-thinking`` 参数启用该功能。"
 
diff --git a/doc/source/models/custom.rst b/doc/source/models/custom.rst
index 87b479cab8..27f70b939e 100644
--- a/doc/source/models/custom.rst
+++ b/doc/source/models/custom.rst
@@ -18,7 +18,7 @@ For example:
 
   .. code-tab:: bash shell
 
-    xinference launch --model_path <model_file_path> --model-engine <engine> -n qwen1.5-chat
+    xinference launch --model-path <model_file_path> --model-engine <engine> -n qwen1.5-chat
 
   .. code-tab:: bash cURL
 
@@ -49,9 +49,32 @@ The above example demonstrates how to directly launch a qwen1.5-chat model file
 For distributed scenarios, if your model file is on a specific worker,
 you can directly launch it using the ``worker_ip`` and ``model_path`` parameters with the launch interface.
 
+.. note::
+   For CLI usage, prefer ``--model-path`` (kebab-case). ``--model_path`` is legacy-compatible but not recommended.
+
 Define a custom model
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
+Web UI: Automatic LLM Config Parsing
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When registering a custom LLM via the Web UI, Xinference can automatically parse the model
+configuration and pre-fill key fields for you.
+
+You only need to provide:
+
+- **Model path / Model ID** (where the model lives, local path or hub ID)
+- **Model Family**
+
+After parsing, the UI can auto-populate fields such as:
+
+- ``Context Length``
+- ``Model_Languages``
+- ``Model_Abilities``
+- ``Model_Specs``
+
+You can review and edit these fields before saving the custom model.
+
 Define a custom model based on the following templates:
 
 .. tabs::
@@ -277,8 +300,9 @@ Define a custom model based on the following templates:
 * model_family: A required string representing the family of the model you want to register. This parameter must not conflict with any builtin model names.
 * model_specs: An array of objects defining the specifications of the model. These include:
    * model_format: A string that defines the model format, like "pytorch" or "ggufv2".
-   * model_size_in_billions: An integer defining the size of the model in billions of parameters.
-   * quantizations: A list of strings defining the available quantizations for the model. For PyTorch models, it could be "4-bit", "8-bit", or "none". For ggufv2 models, the quantizations should correspond to values that work with the ``model_file_name_template``.
+* model_size_in_billions: An integer defining the size of the model in billions of parameters.
+* quantizations: A list of strings defining the available quantizations for the model. For PyTorch models, it could be "4-bit", "8-bit", or "none". For ggufv2 models, the quantizations should correspond to values that work with the ``model_file_name_template``.
+  Some engines also support ``fp4`` / ``fp8`` / ``bnb`` formats (see :ref:`installation` for backend support details).
    * model_id: A string representing the model ID, possibly referring to an identifier used by Hugging Face. **If model_uri is missing, Xinference will try to download the model from the huggingface repository specified here.**.
    * model_hub: A string representing where to download the model from, like "Huggingface" or "modelscope"
    * model_uri: A string representing the URI where the model can be loaded from, such as "file:///path/to/llama-2-7b". **When the model format is ggufv2, model_uri must be the specific file path. When the model format is pytorch, model_uri must be the path to the directory containing the model files.** If model URI is absent, Xinference will try to download the model from Hugging Face with the model ID.
@@ -289,7 +313,7 @@ Define a custom model based on the following templates:
 * reasoning_start_tag: A special token or prompt used to explicitly instruct the LLM to begin its chain-of-thought or reasoning process in its output.
 * reasoning_end_tag: A special token or prompt used to explicitly mark the end of the model's chain-of-thought or reasoning process in its output.
 * cache_config: A string representing the parameters and rules for how the system stores and manages temporary data (cache).
-* virtualenv: An array refers to the name or path of a self-contained Python environment used to isolate dependencies required to run a specific model or project. Please refer to :ref:`this document <virtualenv>`.
+* virtualenv: A settings object for model dependency isolation. Please refer to :ref:`this document <virtualenv>` for details.
 
 Register a Custom Model
 ~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/source/models/virtualenv.rst b/doc/source/models/virtualenv.rst
index 39112458da..5de5766142 100644
--- a/doc/source/models/virtualenv.rst
+++ b/doc/source/models/virtualenv.rst
@@ -49,22 +49,58 @@ Example usage:
 
 .. note::
 
-  The model virtual environment feature is disabled by default (i.e., XINFERENCE_ENABLE_VIRTUAL_ENV is set to 0).
+  Starting from **Xinference v2.0**, the model virtual environment feature is
+  enabled by default (i.e., ``XINFERENCE_ENABLE_VIRTUAL_ENV`` defaults to ``1``).
 
-  It will be enabled by default starting from Xinference v2.0.0.
+  To disable it globally, set ``XINFERENCE_ENABLE_VIRTUAL_ENV=0`` when starting Xinference.
 
 When enabled, Xinference will automatically create a dedicated virtual environment for each model when it is loaded,
 and install its specific dependencies there. This prevents dependency conflicts between models,
 allowing them to run in isolation without affecting one another.
 
-Supported Models
-################
+Using Virtual Environments (v2.0)
+#################################
+
+Global toggle
+~~~~~~~~~~~~~
+
+Virtual environments are enabled by default starting from v2.0. You can still override this globally:
+
+.. code-block:: bash
+
+  # Enable globally (default)
+  XINFERENCE_ENABLE_VIRTUAL_ENV=1 xinference-local -H 0.0.0.0 -p 9997
+
+  # Disable globally
+  XINFERENCE_ENABLE_VIRTUAL_ENV=0 xinference-local -H 0.0.0.0 -p 9997
+
+Per-model override at launch time
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can override the global setting when launching a model:
+
+.. code-block:: bash
+
+  # Force enable for this model
+  xinference launch -n qwen2.5-instruct --model-engine transformers --enable-virtual-env
+
+  # Force disable for this model
+  xinference launch -n qwen2.5-instruct --model-engine transformers --disable-virtual-env
+
+Add or override packages at launch time
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use ``--virtual-env-package`` (or ``-vp``) multiple times:
+
+.. code-block:: bash
 
-Currently, this feature supports the following models:
+  xinference launch -n qwen2.5-instruct --model-engine transformers \
+    --virtual-env-package transformers==4.46.3 \
+    --virtual-env-package accelerate==0.33.0
+
+If you specify a package that already exists in the model's default virtualenv package list,
+your version replaces the default instead of being appended.
 
-* :ref:`GOT-OCR2 <models_builtin_got-ocr2_0>`
-* :ref:`Qwen2.5-omni <models_llm_qwen2.5-omni>`
-* ... (New models since v1.5.0 will all consider to add support)
 
 Storage Location
 ################
@@ -74,6 +110,7 @@ By default, the model’s virtual environment is stored under path:
 * Before v1.6.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / virtualenv / {model_name}
 * From v1.6.0 to v1.13.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / virtualenv / v2 / {model_name}
 * Since v1.14.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / virtualenv / v3 / {model_name} / {python_version}
+* Since v2.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / virtualenv / v4 / {model_name} / {model_engine} / {python_version}
 
 Experimental Feature
 ####################
@@ -196,6 +233,67 @@ In addition to the standard way of specifying package dependencies, such as ``tr
 * ``#system_xxx#``: Using the same version as the system site packages, such as ``#system_numpy#``,
   ensures that the installed package matches the system site package version of numpy. This helps prevent dependency conflicts.
 
+Authoring Custom Models (JSON)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When registering a custom model, you can define a ``virtualenv`` block in the model JSON.
+Starting from v2.0 (v4 flow), **engine-aware markers are recommended** so one JSON can cover
+multiple engines.
+
+Important rule:
+If a new model supports a specific engine, you **must** include at least one package
+entry for that engine in ``virtualenv.packages`` and attach a marker, for example
+``#engine# == "vllm"``. Engine availability checks rely on these markers when
+virtual environments are enabled.
+
+Minimal virtualenv block (recommended)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: json
+
+  {
+    "virtualenv": {
+      "packages": [
+        "#transformers_dependencies# ; #engine# == \"transformers\"",
+        "#vllm_dependencies# ; #engine# == \"vllm\"",
+        "#sglang_dependencies# ; #engine# == \"sglang\"",
+        "#llama_cpp_dependencies# ; #engine# == \"llama.cpp\"",
+        "#mlx_dependencies# ; #engine# == \"mlx\"",
+        "#system_numpy# ; #engine# == \"vllm\""
+      ]
+    }
+  }
+
+Field reference
+^^^^^^^^^^^^^^^
+
+- ``packages`` (required): list of pip requirement strings or markers.
+- ``inherit_pip_config`` (default ``true``): inherit system pip configuration if present.
+- ``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host``:
+  pip index and mirror controls.
+- ``index_strategy``: passed through to the virtualenv installer (used by some engines).
+- ``no_build_isolation``: pip build isolation switch for tricky builds.
+
+Engine placeholders
+^^^^^^^^^^^^^^^^^^
+
+Use wrapped placeholders to inject engine defaults:
+
+- ``#vllm_dependencies#``
+- ``#sglang_dependencies#``
+- ``#mlx_dependencies#``
+- ``#transformers_dependencies#``
+- ``#llama_cpp_dependencies#``
+- ``#diffusers_dependencies#``
+- ``#sentence_transformers_dependencies#``
+
+Markers and case
+^^^^^^^^^^^^^^^^
+
+Markers use ``#engine#`` or ``#model_engine#`` comparisons (case-sensitive).
+Engine values are passed in lowercase internally, so prefer lowercase values,
+for example ``#engine# == "vllm"`` or ``#engine# == "transformers"``.
+
 
 .. _manage_virtual_enviroments:
 
@@ -242,5 +340,3 @@ environment with the required packages.
 **Removing Environments**:
 Delete specific virtual environments by model name and optionally
 Python version, or remove all environments for a model.
-
-
diff --git a/doc/source/reference/index.rst b/doc/source/reference/index.rst
index 9209fe6530..e9984a91c3 100644
--- a/doc/source/reference/index.rst
+++ b/doc/source/reference/index.rst
@@ -15,11 +15,25 @@ Client
    xinference.client.Client.describe_model
    xinference.client.Client.get_model
    xinference.client.Client.get_model_registration
+   xinference.client.Client.get_launch_model_progress
+   xinference.client.Client.cancel_launch_model
+   xinference.client.Client.get_instance_info
    xinference.client.Client.launch_model
    xinference.client.Client.list_model_registrations
    xinference.client.Client.list_models
+   xinference.client.Client.list_cached_models
+   xinference.client.Client.list_deletable_models
+   xinference.client.Client.confirm_and_remove_model
+   xinference.client.Client.query_engine_by_model_name
    xinference.client.Client.register_model
    xinference.client.Client.terminate_model
+   xinference.client.Client.abort_request
+   xinference.client.Client.vllm_models
+   xinference.client.Client.login
+   xinference.client.Client.get_workers_info
+   xinference.client.Client.get_supervisor_info
+   xinference.client.Client.get_progress
+   xinference.client.Client.abort_cluster
    xinference.client.Client.unregister_model
 
 
@@ -48,6 +62,16 @@ EmbeddingModelHandle
    xinference.client.handlers.EmbeddingModelHandle.create_embedding
 
 
+RerankModelHandle
+^^^^^^^^^^^^^^^^^
+.. autosummary::
+   :toctree: generated/
+
+   xinference.client.restful.restful_client.RESTfulRerankModelHandle
+
+   xinference.client.restful.restful_client.RESTfulRerankModelHandle.rerank
+
+
 GenerateModelHandle
 ^^^^^^^^^^^^^^^^^^^
 .. autosummary::
@@ -80,6 +104,16 @@ AudioModelHandle
    xinference.client.handlers.AudioModelHandle.speech
 
 
+FlexibleModelHandle
+^^^^^^^^^^^^^^^^^^^
+.. autosummary::
+   :toctree: generated/
+
+   xinference.client.restful.restful_client.RESTfulFlexibleModelHandle
+
+   xinference.client.restful.restful_client.RESTfulFlexibleModelHandle.infer
+
+
 VideoModelHandle
 ^^^^^^^^^^^^^^^^
 .. autosummary::
diff --git a/doc/source/user_guide/backends.rst b/doc/source/user_guide/backends.rst
index b87fccd838..b6cf5e3c9a 100644
--- a/doc/source/user_guide/backends.rst
+++ b/doc/source/user_guide/backends.rst
@@ -117,7 +117,7 @@ vLLM is fast with:
 
 When the following conditions are met, Xinference will choose vLLM as the inference engine:
 
-- The model format is ``pytorch``, ``gptq`` or ``awq``.
+- The model format is ``pytorch``, ``gptq``, ``awq``, ``fp4``, ``fp8`` or ``bnb``.
 - When the model format is ``pytorch``, the quantization is ``none``.
 - When the model format is ``awq``, the quantization is ``Int4``.
 - When the model format is ``gptq``, the quantization is ``Int3``, ``Int4`` or ``Int8``.
@@ -177,4 +177,3 @@ MLX
 to run LLM on Apple silicon. It's recommended to use for Mac users when running on Apple silicon
 if the model has MLX format support.
 
-
diff --git a/doc/source/user_guide/distributed_inference.rst b/doc/source/user_guide/distributed_inference.rst
index cc092fc552..343961f5dd 100644
--- a/doc/source/user_guide/distributed_inference.rst
+++ b/doc/source/user_guide/distributed_inference.rst
@@ -32,6 +32,11 @@ First you need at least 2 workers to support distributed inference.
 Refer to :ref:`running Xinference in cluster <distributed_getting_started>`
 to create a Xinference cluster including supervisor and workers.
 
+vLLM (v0.11.0+) note:
+Starting from vLLM v0.11.0, distributed deployment with vLLM requires Xinference >= v1.17.1.
+In addition to setting ``--n-worker`` as before, you must also set
+``tensor_parallel_size=2`` and ``pipeline_parallel_size=1`` when launching the model.
+
 Then if are using web UI, choose expected machines for ``worker count`` in the optional configurations,
 if you are using command line, add ``--n-worker <machine number>`` when launching a model.
 The model will be launched across multiple workers accordingly.
diff --git a/doc/source/user_guide/launch.rst b/doc/source/user_guide/launch.rst
index 8fbb028718..45d081d664 100644
--- a/doc/source/user_guide/launch.rst
+++ b/doc/source/user_guide/launch.rst
@@ -27,7 +27,7 @@ Introduce a new environment variable:
 
 .. code-block:: bash
 
-    XINFERENCE_ENV_ALLOW_MULTI_REPLICA_PER_GPU
+    XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU
 
 Control whether to enable the single GPU multi-copy feature
 Default value: 1
@@ -53,7 +53,7 @@ Smart Allocation: Number of replicas may differ from GPU count; system intellige
 GPU Allocation Strategy
 =======================
 
-The current policy is *Idle First*: The scheduler always attempts to assign replicas to the least utilized GPU. Use the ``XINFERENCE_ENV_LAUNCH_STRATEGY`` parameter to choose launch strategy.
+The current policy is *Idle First*: The scheduler always attempts to assign replicas to the least utilized GPU. Use the ``XINFERENCE_LAUNCH_STRATEGY`` parameter to choose launch strategy.
 
 Set Environment Variables
 =========================
@@ -86,3 +86,42 @@ Configuring Model Virtual Environment
 .. versionadded:: v1.8.1
 
 For this part, please refer to :ref:`toggling virtual environments and customizing dependencies <model_launching_virtualenv>`.
+
+Batching / Continuous Batching
+==============================
+
+Xinference supports batching for higher throughput. For LLMs on the ``transformers`` engine,
+continuous batching is available and can be enabled via environment variables at launch time.
+
+Key settings:
+
+- ``XINFERENCE_BATCH_SIZE`` and ``XINFERENCE_BATCH_INTERVAL`` for general batching behavior.
+- ``XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE`` for text-to-image models (when supported).
+
+Example (LLM, transformers):
+
+.. code-block:: bash
+
+  XINFERENCE_BATCH_SIZE=32 XINFERENCE_BATCH_INTERVAL=0.003 xinference-local --log-level debug
+  xinference launch -e <endpoint> --model-engine transformers -n qwen1.5-chat -s 4 -f pytorch -q none
+
+Example (text-to-image):
+
+.. code-block:: bash
+
+  XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE=1024*1024 xinference-local --log-level debug
+
+For detailed behavior, supported models, and aborting requests, see
+:ref:`Continuous Batching <user_guide_continuous_batching>`.
+
+Thinking Mode
+=============
+
+Some hybrid reasoning models (for example, Qwen3) support an optional *thinking mode*.
+You can enable this at launch time via ``--enable-thinking``.
+
+Example usage:
+
+.. code-block:: bash
+
+  xinference launch -n qwen3-xxx --model-engine vllm --enable-thinking

From ac60b95c15e7d3322316b1f6e53638f890981391 Mon Sep 17 00:00:00 2001
From: OliverBryant <2713999266@qq.com>
Date: Thu, 29 Jan 2026 11:16:49 +0800
Subject: [PATCH 2/8] fix some doc issues

---
 .../getting_started/using_docker_image.rst    |   5 +-
 .../getting_started/using_docker_image.po     |  64 +++-----
 .../zh_CN/LC_MESSAGES/models/virtualenv.po    | 149 +++++++++---------
 .../user_guide/distributed_inference.po       |  12 +-
 doc/source/models/custom.rst                  |   2 +
 doc/source/models/virtualenv.rst              |  24 ++-
 .../user_guide/distributed_inference.rst      |   2 +-
 7 files changed, 123 insertions(+), 135 deletions(-)

diff --git a/doc/source/getting_started/using_docker_image.rst b/doc/source/getting_started/using_docker_image.rst
index 3e8b293736..923c9ad5d3 100644
--- a/doc/source/getting_started/using_docker_image.rst
+++ b/doc/source/getting_started/using_docker_image.rst
@@ -11,8 +11,6 @@ Prerequisites
 =============
 * The image can only run in an environment with GPUs and CUDA installed, because Xinference in the image relies on Nvidia GPUs for acceleration.
 * CUDA must be successfully installed on the host machine. This can be determined by whether you can successfully execute the ``nvidia-smi`` command.
-* For CUDA version < 12.8, CUDA version in the docker image is ``12.4``, and the CUDA version on the host machine should be ``12.4`` or above, and the NVIDIA driver version should be ``550`` or above.
-* For CUDA version >= 12.8 and <12.9, CUDA version in the docker image is ``12.8``, and the CUDA version on the host machine should be ``12.8`` or above, and the NVIDIA driver version should be ``570`` or above.
 * For CUDA version >= 12.9, CUDA version in the docker image is ``12.9``, and the CUDA version on the host machine should be ``12.9`` or above, and the NVIDIA driver version should be ``575`` or above.
 * Ensure `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_ installed.
 
@@ -28,7 +26,7 @@ Available tags include:
 * For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``.
 * For CUDA 12.9, add ``-cu129`` suffix, e.g. ``nightly-main-cu129``. (Xinference version should be v1.16.0 at least)
 
-.. note::
+.. versionchanged:: v2.0.0
 
    Starting from **Xinference v2.0**, only ``-cu129`` and ``-cpu`` images are officially provided.
 
@@ -98,4 +96,3 @@ at <home_path>/.cache/huggingface and <home_path>/.cache/modelscope. The command
      --gpus all \
      xprobe/xinference:v<your_version> \
      xinference-local -H 0.0.0.0
-
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
index 805aae4904..1a0580b1e8 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-01-28 11:54+0800\n"
+"POT-Creation-Date: 2026-01-29 11:03+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -46,24 +46,6 @@ msgstr "保证 CUDA 在机器上正确安装。可以使用 ``nvidia-smi`` 检
 
 #: ../../source/getting_started/using_docker_image.rst:14
 msgid ""
-"For CUDA version < 12.8, CUDA version in the docker image is ``12.4``, "
-"and the CUDA version on the host machine should be ``12.4`` or above, and"
-" the NVIDIA driver version should be ``550`` or above."
-msgstr ""
-"对于 CUDA 版本小于 12.8，镜像中的 CUDA 版本为 ``12.4`` 。为了不出现预期之外的问题，请将宿主机的 CUDA 版本和 "
-"NVIDIA Driver 版本分别升级到 ``12.4`` 和 ``550`` 以上。"
-
-#: ../../source/getting_started/using_docker_image.rst:15
-msgid ""
-"For CUDA version >= 12.8 and <12.9, CUDA version in the docker image is "
-"``12.8``, and the CUDA version on the host machine should be ``12.8`` or "
-"above, and the NVIDIA driver version should be ``570`` or above."
-msgstr ""
-"对于 CUDA 版本 >= 12.8 且 < 12.9，Docker 镜像中使用的 CUDA 版本为 ``12.8``。宿主机上的 CUDA "
-"版本需为 ``12.8`` 或以上，同时 NVIDIA 驱动版本需为 ``570`` 或以上。"
-
-#: ../../source/getting_started/using_docker_image.rst:16
-msgid ""
 "For CUDA version >= 12.9, CUDA version in the docker image is ``12.9``, "
 "and the CUDA version on the host machine should be ``12.9`` or above, and"
 " the NVIDIA driver version should be ``575`` or above."
@@ -71,7 +53,7 @@ msgstr ""
 "对于 CUDA 版本 >= 12.9，Docker 镜像中使用的 CUDA 版本为 ``12.9``。宿主机上的 CUDA 版本需为 "
 "``12.9`` 或以上，同时 NVIDIA 驱动版本需为 ``575`` 或以上。"
 
-#: ../../source/getting_started/using_docker_image.rst:17
+#: ../../source/getting_started/using_docker_image.rst:15
 msgid ""
 "Ensure `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter"
 "/cloud-native/container-toolkit/latest/install-guide.html>`_ installed."
@@ -79,40 +61,40 @@ msgstr ""
 "请确保已安装 `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter"
 "/cloud-native/container-toolkit/latest/install-guide.html>`_ 。"
 
-#: ../../source/getting_started/using_docker_image.rst:21
+#: ../../source/getting_started/using_docker_image.rst:19
 msgid "Docker Image"
 msgstr "Docker 镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:22
+#: ../../source/getting_started/using_docker_image.rst:20
 msgid ""
 "The official image of Xinference is available on DockerHub in the "
 "repository ``xprobe/xinference``. Available tags include:"
 msgstr "Xinference 官方镜像已发布在 DockerHub 上的 ``xprobe/xinference`` 仓库中。当前可用的标签包括："
 
-#: ../../source/getting_started/using_docker_image.rst:25
+#: ../../source/getting_started/using_docker_image.rst:23
 msgid ""
 "``nightly-main``: This image is built daily from the `GitHub main branch "
 "<https://github.com/xorbitsai/inference>`_ and generally does not "
 "guarantee stability."
 msgstr "``nightly-main``: 这个镜像会每天从 GitHub main 分支更新制作，不保证稳定可靠。"
 
-#: ../../source/getting_started/using_docker_image.rst:26
+#: ../../source/getting_started/using_docker_image.rst:24
 msgid ""
 "``v<release version>``: This image is built each time a Xinference "
 "release version is published, and it is typically more stable."
 msgstr "``v<release version>``: 这个镜像会在 Xinference 每次发布的时候制作，通常可以认为是稳定可靠的。"
 
-#: ../../source/getting_started/using_docker_image.rst:27
+#: ../../source/getting_started/using_docker_image.rst:25
 msgid ""
 "``latest``: This image is built with the latest Xinference release "
 "version."
 msgstr "``latest``: 这个镜像会在 Xinference 发布时指向最新的发布版本"
 
-#: ../../source/getting_started/using_docker_image.rst:28
+#: ../../source/getting_started/using_docker_image.rst:26
 msgid "For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``."
 msgstr "对于 CPU 版本，增加 ``-cpu`` 后缀，如 ``nightly-main-cpu``。"
 
-#: ../../source/getting_started/using_docker_image.rst:29
+#: ../../source/getting_started/using_docker_image.rst:27
 msgid ""
 "For CUDA 12.9, add ``-cu129`` suffix, e.g. ``nightly-main-cu129``. "
 "(Xinference version should be v1.16.0 at least)"
@@ -120,17 +102,17 @@ msgstr ""
 "对于 CUDA 12.9 版本，增加 ``-cu129`` 后缀，如 ``nightly-main-cu129`` 。（Xinference "
 "版本需要至少 v1.16.0）"
 
-#: ../../source/getting_started/using_docker_image.rst:33
+#: ../../source/getting_started/using_docker_image.rst:31
 msgid ""
 "Starting from **Xinference v2.0**, only ``-cu129`` and ``-cpu`` images "
 "are officially provided."
 msgstr ""
 
-#: ../../source/getting_started/using_docker_image.rst:37
+#: ../../source/getting_started/using_docker_image.rst:35
 msgid "Dockerfile for custom build"
 msgstr "自定义镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:38
+#: ../../source/getting_started/using_docker_image.rst:36
 msgid ""
 "If you need to build the Xinference image according to your own "
 "requirements, the source code for the Dockerfile is located at "
@@ -143,11 +125,11 @@ msgstr ""
 "<https://github.com/xorbitsai/inference/tree/main/xinference/deploy/docker/Dockerfile>`_"
 " 。请确保使用 Dockerfile 制作镜像时在 Xinference 项目的根目录下。比如："
 
-#: ../../source/getting_started/using_docker_image.rst:49
+#: ../../source/getting_started/using_docker_image.rst:47
 msgid "Image usage"
 msgstr "使用镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:50
+#: ../../source/getting_started/using_docker_image.rst:48
 msgid ""
 "You can start Xinference in the container like this, simultaneously "
 "mapping port 9997 in the container to port 9998 on the host, enabling "
@@ -156,43 +138,43 @@ msgstr ""
 "你可以使用如下方式在容器内启动 Xinference，同时将 9997 端口映射到宿主机的 9998 端口，并且指定日志级别为 "
 "DEBUG，也可以指定需要的环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:58
+#: ../../source/getting_started/using_docker_image.rst:56
 msgid ""
 "The option ``--gpus`` is essential and cannot be omitted, because as "
 "mentioned earlier, the image requires the host machine to have a GPU. "
 "Otherwise, errors will occur."
 msgstr "``--gpus`` 必须指定，正如前文描述，镜像必须运行在有 GPU 的机器上，否则会出现错误。"
 
-#: ../../source/getting_started/using_docker_image.rst:59
+#: ../../source/getting_started/using_docker_image.rst:57
 msgid ""
 "The ``-H 0.0.0.0`` parameter after the ``xinference-local`` command "
 "cannot be omitted. Otherwise, the host machine may not be able to access "
 "the port inside the container."
 msgstr "``-H 0.0.0.0`` 也是必须指定的，否则在容器外无法连接到 Xinference 服务。"
 
-#: ../../source/getting_started/using_docker_image.rst:60
+#: ../../source/getting_started/using_docker_image.rst:58
 msgid ""
 "You can add multiple ``-e`` options to introduce multiple environment "
 "variables."
 msgstr "可以指定多个 ``-e`` 选项赋值多个环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:63
+#: ../../source/getting_started/using_docker_image.rst:61
 msgid ""
 "Certainly, if you prefer, you can also manually enter the docker "
 "container and start Xinference in any desired way."
 msgstr "当然，也可以运行容器后，进入容器内手动拉起 Xinference。"
 
-#: ../../source/getting_started/using_docker_image.rst:67
+#: ../../source/getting_started/using_docker_image.rst:65
 msgid ""
 "For multiple GPUs, make sure to set the shared memory size, for example: "
 "`docker run --shm-size=128g ...`"
 msgstr "对于多张 GPU，确保设置共享内存大小，例如：`docker run --shm-size=128g ...`"
 
-#: ../../source/getting_started/using_docker_image.rst:71
+#: ../../source/getting_started/using_docker_image.rst:69
 msgid "Mount your volume for loading and saving models"
 msgstr "挂载模型目录"
 
-#: ../../source/getting_started/using_docker_image.rst:72
+#: ../../source/getting_started/using_docker_image.rst:70
 msgid ""
 "The image does not contain any model files by default, and it downloads "
 "the models into the container. Typically, you would need to mount a "
@@ -204,7 +186,7 @@ msgstr ""
 "默认情况下，镜像中不包含任何模型文件，使用过程中会在容器内下载模型。如果需要使用已经下载好的模型，需要将宿主机的目录挂载到容器内。这种情况下，需要在运行容器时指定本地卷，并且为"
 " Xinference 配置环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:81
+#: ../../source/getting_started/using_docker_image.rst:79
 msgid ""
 "The principle behind the above command is to mount the specified "
 "directory from the host machine into the container, and then set the "
@@ -219,7 +201,7 @@ msgstr ""
 "环境变量指向容器内的该目录。这样，所有下载的模型文件将存储在您在主机上指定的目录中。您无需担心在 Docker "
 "容器停止时丢失这些文件，下次运行容器时，您可以直接使用现有的模型，无需重复下载。"
 
-#: ../../source/getting_started/using_docker_image.rst:85
+#: ../../source/getting_started/using_docker_image.rst:83
 msgid ""
 "If you downloaded the model using the default path on the host machine, "
 "and since the xinference cache directory stores the model using symbolic "
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
index 02a3919401..0cf57e47df 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-01-28 15:00+0800\n"
+"POT-Creation-Date: 2026-01-29 11:03+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -61,8 +61,8 @@ msgid ""
 "``XINFERENCE_ENABLE_VIRTUAL_ENV=1``."
 msgstr "通过设置环境变量 ``XINFERENCE_ENABLE_VIRTUAL_ENV=1`` 启用该功能。"
 
-#: ../../source/models/virtualenv.rst:34 ../../source/models/virtualenv.rst:209
-#: ../../source/models/virtualenv.rst:225
+#: ../../source/models/virtualenv.rst:34 ../../source/models/virtualenv.rst:207
+#: ../../source/models/virtualenv.rst:223
 msgid "Example usage:"
 msgstr "使用示例："
 
@@ -181,16 +181,16 @@ msgstr ""
 "virtualenv / v4 / {model_name} / {model_engine} / {python_version}"
 
 #: ../../source/models/virtualenv.rst:116
-msgid "Experimental Feature"
-msgstr "实验功能"
+msgid "Skip Installed Libraries"
+msgstr "跳过已安装的库"
 
-#: ../../source/models/virtualenv.rst:125
+#: ../../source/models/virtualenv.rst:122
 msgid ""
 "This feature requires ``xoscar >= 0.7.12``, which is the minimum Xoscar "
 "version required for Xinference v1.8.1."
 msgstr "此功能要求 ``xoscar >= 0.7.12``，这是 Xinference v1.8.1 需要的最低 Xoscar 版本。"
 
-#: ../../source/models/virtualenv.rst:127
+#: ../../source/models/virtualenv.rst:124
 msgid ""
 "``xinference`` uses the ``uv`` tool to create virtual environments, with "
 "the current Python **system site-packages** set as the base environment. "
@@ -203,7 +203,7 @@ msgstr ""
 "设置为基础环境。默认情况下，``uv`` "
 "**不会检查系统环境中是否已有包**，而是会在虚拟环境中重新安装所有依赖。这种方式可以更好地与系统包隔离，但可能导致重复安装、初始化时间变长以及磁盘占用增加。"
 
-#: ../../source/models/virtualenv.rst:131
+#: ../../source/models/virtualenv.rst:128
 msgid ""
 "Starting from ``v1.8.1``, an **experimental feature** is available: by "
 "setting the environment variable "
@@ -214,79 +214,81 @@ msgstr ""
 "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1``，``uv`` 将会 **跳过系统 site-"
 "packages 中已存在的包**。"
 
-#: ../../source/models/virtualenv.rst:136
+#: ../../source/models/virtualenv.rst:133
 msgid ""
-"The feature is currently disabled but will be enabled by default in "
-"``v2.0.0``."
-msgstr "该功能当前默认关闭，但将在 ``v2.0.0`` 版本中默认启用。"
+"This feature is enabled by default in ``v2.0.0``. To disable it, set "
+"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0``."
+msgstr ""
+"此功能在``v2.0.0``版本中默认启用。若需禁用，请设置"
+"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` 。"
 
-#: ../../source/models/virtualenv.rst:139
+#: ../../source/models/virtualenv.rst:137
 msgid "Advantages"
 msgstr "优势"
 
-#: ../../source/models/virtualenv.rst:141
+#: ../../source/models/virtualenv.rst:139
 msgid ""
 "Avoid redundant installations of large dependencies (e.g., ``torch`` + "
 "``CUDA``)."
 msgstr "避免重复安装大型依赖（例如 ``torch`` + ``CUDA``）。"
 
-#: ../../source/models/virtualenv.rst:142
+#: ../../source/models/virtualenv.rst:140
 msgid "Speed up virtual environment creation."
 msgstr "加快虚拟环境创建速度。"
 
-#: ../../source/models/virtualenv.rst:143
+#: ../../source/models/virtualenv.rst:141
 msgid "Reduce disk usage."
 msgstr "减少磁盘空间占用。"
 
-#: ../../source/models/virtualenv.rst:146
+#: ../../source/models/virtualenv.rst:144
 msgid "Usage"
 msgstr "使用"
 
-#: ../../source/models/virtualenv.rst:158
+#: ../../source/models/virtualenv.rst:156
 msgid "Performance Comparison"
 msgstr "性能对比"
 
-#: ../../source/models/virtualenv.rst:160
+#: ../../source/models/virtualenv.rst:158
 msgid "Using the ``CosyVoice 0.5B`` model as an example:"
 msgstr "以 ``CosyVoice 0.5B`` 模型为例："
 
-#: ../../source/models/virtualenv.rst:162
+#: ../../source/models/virtualenv.rst:160
 msgid "**Without this feature enabled**::"
 msgstr "**未开启该功能时**::"
 
-#: ../../source/models/virtualenv.rst:173
+#: ../../source/models/virtualenv.rst:171
 msgid "**With this feature enabled**::"
 msgstr "**开启该功能后**::"
 
-#: ../../source/models/virtualenv.rst:188
+#: ../../source/models/virtualenv.rst:186
 msgid "Model Launching: Toggle Virtual Environments and Customize Dependencies"
 msgstr "模型加载：开关虚拟环境并自定义依赖"
 
-#: ../../source/models/virtualenv.rst:192
+#: ../../source/models/virtualenv.rst:190
 msgid ""
 "Starting from v1.8.1, we support toggling the virtual environment for "
 "individual model launching, as well as overriding the model's default "
 "settings with custom package dependencies."
 msgstr "从 v1.8.1 开始，我们支持对单个模型加载开关虚拟环境，并用自定义包依赖覆盖模型的默认设置。"
 
-#: ../../source/models/virtualenv.rst:196
+#: ../../source/models/virtualenv.rst:194
 msgid "Toggle Virtual Environment"
 msgstr "开关模型虚拟空间"
 
-#: ../../source/models/virtualenv.rst:198
+#: ../../source/models/virtualenv.rst:196
 msgid ""
 "When loading a model, you can specify whether to enable the model's "
 "virtual environment. If not specified, the setting will follow the "
 "environment variable configuration."
 msgstr "加载模型时，可以指定是否启用模型的虚拟环境。如果未指定，则默认遵循环境变量的配置。"
 
-#: ../../source/models/virtualenv.rst:201
+#: ../../source/models/virtualenv.rst:199
 msgid ""
 "For the Web UI, this can be toggled on or off through the optional "
 "settings switch."
 msgstr "在 Web UI 中，可以通过可选设置开关打开或关闭该功能。"
 
-#: ../../source/models/virtualenv.rst:207
+#: ../../source/models/virtualenv.rst:205
 msgid ""
 "For command-line loading, use the ``--enable-virtual-env`` option to "
 "enable the virtual environment, or ``--disable-virtual-env`` to disable "
@@ -295,11 +297,11 @@ msgstr ""
 "命令行加载时，使用 ``--enable-virtual-env`` 选项启用虚拟环境，使用 ``--disable-virtual-env`` "
 "选项禁用虚拟环境。"
 
-#: ../../source/models/virtualenv.rst:216
+#: ../../source/models/virtualenv.rst:214
 msgid "Set Virtual Environment Package Dependencies"
 msgstr "设置虚拟环境包依赖"
 
-#: ../../source/models/virtualenv.rst:218
+#: ../../source/models/virtualenv.rst:216
 msgid ""
 "For supported models, Xinference has already defined the package "
 "dependencies and version requirements within the virtual environment. "
@@ -307,25 +309,25 @@ msgid ""
 " dependencies, you can manually provide them during model loading."
 msgstr "对于支持的模型，Xinference 已经在虚拟环境中定义了包依赖和版本要求。但如果需要指定特定版本或安装额外依赖，可以在加载模型时手动提供。"
 
-#: ../../source/models/virtualenv.rst:221
+#: ../../source/models/virtualenv.rst:219
 msgid ""
 "In the Web UI, you can add custom dependencies by clicking the plus icon "
 "in the same location as the virtual environment toggle."
 msgstr "在 Web UI 中，可以在虚拟环境开关同一位置点击加号图标来添加自定义依赖。"
 
-#: ../../source/models/virtualenv.rst:223
+#: ../../source/models/virtualenv.rst:221
 msgid ""
 "For the command line, use ``--virtual-env-package`` or ``-vp`` to specify"
 " a single package version."
 msgstr "命令行中，使用 ``--virtual-env-package`` 或 ``-vp`` 来指定单个包版本。"
 
-#: ../../source/models/virtualenv.rst:231
+#: ../../source/models/virtualenv.rst:229
 msgid ""
 "In addition to the standard way of specifying package dependencies, such "
 "as ``transformers==xxx``, Xinference also supports some extended syntax."
 msgstr "除了常规的包依赖指定方式（如 ``transformers==xxx``），Xinference 还支持一些扩展语法。"
 
-#: ../../source/models/virtualenv.rst:233
+#: ../../source/models/virtualenv.rst:231
 msgid ""
 "``#system_xxx#``: Using the same version as the system site packages, "
 "such as ``#system_numpy#``, ensures that the installed package matches "
@@ -335,19 +337,21 @@ msgstr ""
 "``#system_xxx#``：使用与系统 site packages 相同的版本，例如 "
 "``#system_numpy#``，确保安装的包版本与系统 site packages 中的 numpy 版本一致，防止依赖冲突。"
 
-#: ../../source/models/virtualenv.rst:237
-msgid "Authoring Custom Models (JSON)"
-msgstr "创建自定义模型（JSON）"
+#: ../../source/models/virtualenv.rst:235
+msgid "ModelHub JSON for Xinference Models"
+msgstr "ModelHub JSON 格式（适用于 Xinference 模型）"
 
-#: ../../source/models/virtualenv.rst:239
+#: ../../source/models/virtualenv.rst:237
 msgid ""
-"When registering a custom model, you can define a ``virtualenv`` block in"
-" the model JSON. Starting from v2.0 (v4 flow), **engine-aware markers are"
-" recommended** so one JSON can cover multiple engines."
+"If you plan to add a model to a model hub for Xinference, define a "
+"``virtualenv`` block in the model JSON. Starting from v2.0 (v4 flow), "
+"**engine-aware markers are recommended** so one JSON can cover multiple "
+"engines."
 msgstr ""
-"注册自定义模型时，可在模型 JSON 中定义一个 ``virtualenv`` 块。从 v2.0（v4 流程）开始， **建议使用引擎感知标记** ，以便单个 JSON 文件覆盖多个引擎。"
+"若计划将模型添加至Xinference的Model Hub，请在模型JSON中定义一个``virtualenv``块。"
+"自v2.0（v4流程）起， **建议使用引擎感知标记** ，以便单个JSON文件覆盖多个引擎。"
 
-#: ../../source/models/virtualenv.rst:243
+#: ../../source/models/virtualenv.rst:241
 msgid ""
 "Important rule: If a new model supports a specific engine, you **must** "
 "include at least one package entry for that engine in "
@@ -355,94 +359,97 @@ msgid ""
 "\"vllm\"``. Engine availability checks rely on these markers when virtual"
 " environments are enabled."
 msgstr ""
-"重要规则：若新模型支持特定引擎，则 **必须** 在 ``virtualenv.packages`` 中至少包含该引擎的一个包条目，并附加标记（例如 ``#engine# == \"vllm\"`` ）。"
-"当虚拟环境启用时，引擎可用性检查依赖这些标记进行验证。"
+"重要规则：若新模型支持特定引擎，则 **必须** 在 ``virtualenv.packages`` "
+"中至少包含该引擎的一个包条目，并附加标记（例如 ``#engine# == \"vllm\"`` "
+"）。当虚拟环境启用时，引擎可用性检查依赖这些标记进行验证。"
 
-#: ../../source/models/virtualenv.rst:270
+#: ../../source/models/virtualenv.rst:268
 msgid "``packages`` (required): list of pip requirement strings or markers."
 msgstr " ``packages`` （必填）：pip 要求字符串或标记的列表。"
 
-#: ../../source/models/virtualenv.rst:271
+#: ../../source/models/virtualenv.rst:269
 msgid ""
 "``inherit_pip_config`` (default ``true``): inherit system pip "
 "configuration if present."
 msgstr " ``inherit_pip_config`` （默认值为 ``true`` ）：若存在系统 pip 配置文件，则继承其设置。"
 
-#: ../../source/models/virtualenv.rst:272
+#: ../../source/models/virtualenv.rst:270
 msgid ""
 "``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host``: "
 "pip index and mirror controls."
-msgstr " ``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host`` : "
-"pip 索引和镜像控制。"
+msgstr ""
+" ``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host`` "
+": pip 索引和镜像控制。"
 
-#: ../../source/models/virtualenv.rst:274
+#: ../../source/models/virtualenv.rst:272
 msgid ""
 "``index_strategy``: passed through to the virtualenv installer (used by "
 "some engines)."
 msgstr " ``index_strategy`` ：传递给虚拟环境安装程序（由某些引擎使用）。"
 
-#: ../../source/models/virtualenv.rst:275
+#: ../../source/models/virtualenv.rst:273
 msgid "``no_build_isolation``: pip build isolation switch for tricky builds."
 msgstr " ``no_build_isolation`` ：用于处理复杂构建的pip构建隔离开关。"
 
-#: ../../source/models/virtualenv.rst:280
+#: ../../source/models/virtualenv.rst:278
 msgid "Use wrapped placeholders to inject engine defaults:"
 msgstr "使用包裹的占位符注入引擎默认值："
 
-#: ../../source/models/virtualenv.rst:282
+#: ../../source/models/virtualenv.rst:280
 msgid "``#vllm_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:283
+#: ../../source/models/virtualenv.rst:281
 msgid "``#sglang_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:284
+#: ../../source/models/virtualenv.rst:282
 msgid "``#mlx_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:285
+#: ../../source/models/virtualenv.rst:283
 msgid "``#transformers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:286
+#: ../../source/models/virtualenv.rst:284
 msgid "``#llama_cpp_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:287
+#: ../../source/models/virtualenv.rst:285
 msgid "``#diffusers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:288
+#: ../../source/models/virtualenv.rst:286
 msgid "``#sentence_transformers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:293
+#: ../../source/models/virtualenv.rst:291
 msgid ""
 "Markers use ``#engine#`` or ``#model_engine#`` comparisons (case-"
 "sensitive). Engine values are passed in lowercase internally, so prefer "
 "lowercase values, for example ``#engine# == \"vllm\"`` or ``#engine# == "
 "\"transformers\"``."
 msgstr ""
-"标记使用 ``#engine#`` 或 ``#model_engine#`` 进行比较（区分大小写）。引擎值在内部以小写形式传递，因此建议使用小写值"
-"，例如 ``#engine# == \"vllm\"`` 或 ``#engine# == \"transformers\"`` 。"
+"标记使用 ``#engine#`` 或 ``#model_engine#`` "
+"进行比较（区分大小写）。引擎值在内部以小写形式传递，因此建议使用小写值，例如 ``#engine# == \"vllm\"`` 或 "
+"``#engine# == \"transformers\"`` 。"
 
-#: ../../source/models/virtualenv.rst:301
+#: ../../source/models/virtualenv.rst:299
 msgid "Manage Virtual Enviroments"
 msgstr "虚拟环境管理"
 
-#: ../../source/models/virtualenv.rst:305
+#: ../../source/models/virtualenv.rst:303
 msgid ""
 "Xinference provides comprehensive virtual environment management for "
 "model dependencies, allowing you to create isolated Python environments "
 "for each model with specific package requirements."
 msgstr "Xinference 提供全面的虚拟环境管理功能，允许您为每个模型创建独立的 Python 环境，满足特定的包依赖需求。"
 
-#: ../../source/models/virtualenv.rst:317
+#: ../../source/models/virtualenv.rst:315
 msgid "Key Features"
 msgstr "核心功能"
 
-#: ../../source/models/virtualenv.rst:319
+#: ../../source/models/virtualenv.rst:317
 msgid ""
 "**Multiple Python Version Support**: Each model can have virtual "
 "environments with different Python versions (e.g., Python 3.10.18, "
@@ -451,23 +458,23 @@ msgstr ""
 "**多 Python 版本支持** : 每个模型可以拥有不同 Python 版本的虚拟环境（例如 Python "
 "3.10.18、3.11.5），实现与各种模型要求的兼容性。"
 
-#: ../../source/models/virtualenv.rst:324
+#: ../../source/models/virtualenv.rst:322
 msgid ""
 "**Isolated Dependencies**: Each virtual environment contains its own set "
 "of packages, preventing conflicts between different models' requirements."
 msgstr "**依赖隔离** : 每个虚拟环境包含自己独立的包集合，防止不同模型之间的依赖冲突。"
 
-#: ../../source/models/virtualenv.rst:329
+#: ../../source/models/virtualenv.rst:327
 msgid "Management Operations"
 msgstr "管理操作"
 
-#: ../../source/models/virtualenv.rst:331
+#: ../../source/models/virtualenv.rst:329
 msgid ""
 "**Listing Virtual Environments**: View all virtual environments across "
 "your cluster, filtered by model name or worker IP address."
 msgstr "**列出虚拟环境** : 查看集群中的所有虚拟环境，支持按模型名称或工作节点 IP 地址过滤。"
 
-#: ../../source/models/virtualenv.rst:335
+#: ../../source/models/virtualenv.rst:333
 msgid ""
 "**Creating Environments**: Automatically created when launching models "
 "with enable_virtual_env=true. The system detects your current Python "
@@ -476,7 +483,7 @@ msgstr ""
 "**创建环境** : 当使用 enable_virtual_env=true 启动模型时自动创建。系统会检测当前的 Python "
 "版本并创建包含所需包的独立环境。"
 
-#: ../../source/models/virtualenv.rst:340
+#: ../../source/models/virtualenv.rst:338
 msgid ""
 "**Removing Environments**: Delete specific virtual environments by model "
 "name and optionally Python version, or remove all environments for a "
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po b/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po
index 4c89e64b9c..951e4bb557 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/user_guide/distributed_inference.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-01-28 11:54+0800\n"
+"POT-Creation-Date: 2026-01-29 11:03+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -95,11 +95,13 @@ msgstr ""
 msgid ""
 "vLLM (v0.11.0+) note: Starting from vLLM v0.11.0, distributed deployment "
 "with vLLM requires Xinference >= v1.17.1. In addition to setting "
-"``--n-worker`` as before, you must also set ``tensor_parallel_size=2`` "
-"and ``pipeline_parallel_size=1`` when launching the model."
+"``--n-worker`` as before, you must also set ``tensor_parallel_size`` (set"
+" it to the **GPU count**) and ``pipeline_parallel_size=1`` when launching"
+" the model."
 msgstr ""
-"vLLM（v0.11.0+）注意事项：从vLLM v0.11.0版本开始，使用vLLM进行分布式部署需要Xinference >= v1.17.1版本。"
-"除原有的 ``--n-worker`` 参数设置外，启动模型时还必须同时设置 ``tensor_parallel_size=2`` 和 ``pipeline_parallel_size=1`` 参数。"
+"vLLM（v0.11.0+）注意事项：从vLLM v0.11.0版本开始，使用vLLM进行分布式部署需要Xinference >= "
+"v1.17.1版本。除原有的 ``--n-worker`` 参数设置外，启动模型时还必须同时设置 "
+"``tensor_parallel_size`` （将其设置为 **GPU数量** ）  和 ``pipeline_parallel_size=1`` 参数。"
 
 #: ../../source/user_guide/distributed_inference.rst:40
 msgid ""
diff --git a/doc/source/models/custom.rst b/doc/source/models/custom.rst
index 27f70b939e..031d6e24ad 100644
--- a/doc/source/models/custom.rst
+++ b/doc/source/models/custom.rst
@@ -58,6 +58,8 @@ Define a custom model
 Web UI: Automatic LLM Config Parsing
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
+.. versionadded:: v2.0.0
+
 When registering a custom LLM via the Web UI, Xinference can automatically parse the model
 configuration and pre-fill key fields for you.
 
diff --git a/doc/source/models/virtualenv.rst b/doc/source/models/virtualenv.rst
index 5de5766142..d129505d53 100644
--- a/doc/source/models/virtualenv.rst
+++ b/doc/source/models/virtualenv.rst
@@ -47,7 +47,7 @@ Example usage:
 
   Xinference will by default inherit the config for current pip.
 
-.. note::
+.. versionchanged:: v2.0.0
 
   Starting from **Xinference v2.0**, the model virtual environment feature is
   enabled by default (i.e., ``XINFERENCE_ENABLE_VIRTUAL_ENV`` defaults to ``1``).
@@ -112,14 +112,11 @@ By default, the model’s virtual environment is stored under path:
 * Since v1.14.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / virtualenv / v3 / {model_name} / {python_version}
 * Since v2.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / virtualenv / v4 / {model_name} / {model_engine} / {python_version}
 
-Experimental Feature
-####################
+Skip Installed Libraries
+########################
 
 .. _skip_installed_libraries:
 
-Skip Installed Libraries
-------------------------
-
 .. versionadded:: v1.8.1
 
    This feature requires ``xoscar >= 0.7.12``, which is the minimum Xoscar version required for Xinference v1.8.1.
@@ -131,9 +128,10 @@ This ensures better isolation from system packages but can result in redundant i
 Starting from ``v1.8.1``, an **experimental feature** is available:
 by setting the environment variable ``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1``, ``uv`` will **skip packages already available in system site-packages**.
 
-.. note::
+.. versionchanged:: v2.0.0
 
-    The feature is currently disabled but will be enabled by default in ``v2.0.0``.
+    This feature is enabled by default in ``v2.0.0``. To disable it, set
+    ``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0``.
 
 Advantages
 ~~~~~~~~~~
@@ -233,12 +231,12 @@ In addition to the standard way of specifying package dependencies, such as ``tr
 * ``#system_xxx#``: Using the same version as the system site packages, such as ``#system_numpy#``,
   ensures that the installed package matches the system site package version of numpy. This helps prevent dependency conflicts.
 
-Authoring Custom Models (JSON)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ModelHub JSON for Xinference Models
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-When registering a custom model, you can define a ``virtualenv`` block in the model JSON.
-Starting from v2.0 (v4 flow), **engine-aware markers are recommended** so one JSON can cover
-multiple engines.
+If you plan to add a model to a model hub for Xinference, define a ``virtualenv`` block
+in the model JSON. Starting from v2.0 (v4 flow), **engine-aware markers are recommended**
+so one JSON can cover multiple engines.
 
 Important rule:
 If a new model supports a specific engine, you **must** include at least one package
diff --git a/doc/source/user_guide/distributed_inference.rst b/doc/source/user_guide/distributed_inference.rst
index 343961f5dd..bd22aaee46 100644
--- a/doc/source/user_guide/distributed_inference.rst
+++ b/doc/source/user_guide/distributed_inference.rst
@@ -35,7 +35,7 @@ to create a Xinference cluster including supervisor and workers.
 vLLM (v0.11.0+) note:
 Starting from vLLM v0.11.0, distributed deployment with vLLM requires Xinference >= v1.17.1.
 In addition to setting ``--n-worker`` as before, you must also set
-``tensor_parallel_size=2`` and ``pipeline_parallel_size=1`` when launching the model.
+``tensor_parallel_size`` (set it to the **GPU count**) and ``pipeline_parallel_size=1`` when launching the model.
 
 Then if are using web UI, choose expected machines for ``worker count`` in the optional configurations,
 if you are using command line, add ``--n-worker <machine number>`` when launching a model.

From c7e2d4e30a0a97dec6e5d0b33398f16981919266 Mon Sep 17 00:00:00 2001
From: OliverBryant <2713999266@qq.com>
Date: Thu, 29 Jan 2026 14:36:48 +0800
Subject: [PATCH 3/8] fix docker doc

---
 .../getting_started/using_docker_image.rst    | 11 ++--
 .../getting_started/using_docker_image.po     | 64 ++++++++++---------
 2 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/doc/source/getting_started/using_docker_image.rst b/doc/source/getting_started/using_docker_image.rst
index 923c9ad5d3..162f5f2a91 100644
--- a/doc/source/getting_started/using_docker_image.rst
+++ b/doc/source/getting_started/using_docker_image.rst
@@ -20,15 +20,16 @@ Docker Image
 The official image of Xinference is available on DockerHub in the repository ``xprobe/xinference``.
 Available tags include:
 
+.. versionchanged:: v2.0
+
+   Starting from **Xinference v2.0**, only two image variants are provided:
+   the default (no suffix, **CUDA 12.9**) and the ``-cpu`` image.
+
 * ``nightly-main``: This image is built daily from the `GitHub main branch <https://github.com/xorbitsai/inference>`_ and generally does not guarantee stability.
 * ``v<release version>``: This image is built each time a Xinference release version is published, and it is typically more stable.
 * ``latest``: This image is built with the latest Xinference release version.
 * For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``.
-* For CUDA 12.9, add ``-cu129`` suffix, e.g. ``nightly-main-cu129``. (Xinference version should be v1.16.0 at least)
-
-.. versionchanged:: v2.0.0
-
-   Starting from **Xinference v2.0**, only ``-cu129`` and ``-cpu`` images are officially provided.
+* For CUDA 12.9, no suffix, e.g. ``nightly-main``. (Xinference version should be v2.0 at least)
 
 
 Dockerfile for custom build
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
index 1a0580b1e8..bab6e0f1e6 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-01-29 11:03+0800\n"
+"POT-Creation-Date: 2026-01-29 14:34+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -71,48 +71,48 @@ msgid ""
 "repository ``xprobe/xinference``. Available tags include:"
 msgstr "Xinference 官方镜像已发布在 DockerHub 上的 ``xprobe/xinference`` 仓库中。当前可用的标签包括："
 
-#: ../../source/getting_started/using_docker_image.rst:23
+#: ../../source/getting_started/using_docker_image.rst:25
+msgid ""
+"Starting from **Xinference v2.0**, only two image variants are provided: "
+"the default (no suffix, **CUDA 12.9**) and the ``-cpu`` image."
+msgstr ""
+"从 **Xinference v2.0** 开始，仅提供两种镜像变体：默认镜像（无后缀， **CUDA 12.9** ）和 ``-cpu`` 镜像。"
+
+#: ../../source/getting_started/using_docker_image.rst:28
 msgid ""
 "``nightly-main``: This image is built daily from the `GitHub main branch "
 "<https://github.com/xorbitsai/inference>`_ and generally does not "
 "guarantee stability."
 msgstr "``nightly-main``: 这个镜像会每天从 GitHub main 分支更新制作，不保证稳定可靠。"
 
-#: ../../source/getting_started/using_docker_image.rst:24
+#: ../../source/getting_started/using_docker_image.rst:29
 msgid ""
 "``v<release version>``: This image is built each time a Xinference "
 "release version is published, and it is typically more stable."
 msgstr "``v<release version>``: 这个镜像会在 Xinference 每次发布的时候制作，通常可以认为是稳定可靠的。"
 
-#: ../../source/getting_started/using_docker_image.rst:25
+#: ../../source/getting_started/using_docker_image.rst:30
 msgid ""
 "``latest``: This image is built with the latest Xinference release "
 "version."
 msgstr "``latest``: 这个镜像会在 Xinference 发布时指向最新的发布版本"
 
-#: ../../source/getting_started/using_docker_image.rst:26
+#: ../../source/getting_started/using_docker_image.rst:31
 msgid "For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``."
 msgstr "对于 CPU 版本，增加 ``-cpu`` 后缀，如 ``nightly-main-cpu``。"
 
-#: ../../source/getting_started/using_docker_image.rst:27
+#: ../../source/getting_started/using_docker_image.rst:32
 msgid ""
-"For CUDA 12.9, add ``-cu129`` suffix, e.g. ``nightly-main-cu129``. "
-"(Xinference version should be v1.16.0 at least)"
+"For CUDA 12.9, no suffix, e.g. ``nightly-main``. (Xinference version "
+"should be v2.0 at least)"
 msgstr ""
-"对于 CUDA 12.9 版本，增加 ``-cu129`` 后缀，如 ``nightly-main-cu129`` 。（Xinference "
-"版本需要至少 v1.16.0）"
+"对于 CUDA 12.9，不带后缀，例如 ``nightly-main`` 。（Xinference 版本应至少为 v2.0）"
 
-#: ../../source/getting_started/using_docker_image.rst:31
-msgid ""
-"Starting from **Xinference v2.0**, only ``-cu129`` and ``-cpu`` images "
-"are officially provided."
-msgstr ""
-
-#: ../../source/getting_started/using_docker_image.rst:35
+#: ../../source/getting_started/using_docker_image.rst:36
 msgid "Dockerfile for custom build"
 msgstr "自定义镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:36
+#: ../../source/getting_started/using_docker_image.rst:37
 msgid ""
 "If you need to build the Xinference image according to your own "
 "requirements, the source code for the Dockerfile is located at "
@@ -125,11 +125,11 @@ msgstr ""
 "<https://github.com/xorbitsai/inference/tree/main/xinference/deploy/docker/Dockerfile>`_"
 " 。请确保使用 Dockerfile 制作镜像时在 Xinference 项目的根目录下。比如："
 
-#: ../../source/getting_started/using_docker_image.rst:47
+#: ../../source/getting_started/using_docker_image.rst:48
 msgid "Image usage"
 msgstr "使用镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:48
+#: ../../source/getting_started/using_docker_image.rst:49
 msgid ""
 "You can start Xinference in the container like this, simultaneously "
 "mapping port 9997 in the container to port 9998 on the host, enabling "
@@ -138,43 +138,43 @@ msgstr ""
 "你可以使用如下方式在容器内启动 Xinference，同时将 9997 端口映射到宿主机的 9998 端口，并且指定日志级别为 "
 "DEBUG，也可以指定需要的环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:56
+#: ../../source/getting_started/using_docker_image.rst:57
 msgid ""
 "The option ``--gpus`` is essential and cannot be omitted, because as "
 "mentioned earlier, the image requires the host machine to have a GPU. "
 "Otherwise, errors will occur."
 msgstr "``--gpus`` 必须指定，正如前文描述，镜像必须运行在有 GPU 的机器上，否则会出现错误。"
 
-#: ../../source/getting_started/using_docker_image.rst:57
+#: ../../source/getting_started/using_docker_image.rst:58
 msgid ""
 "The ``-H 0.0.0.0`` parameter after the ``xinference-local`` command "
 "cannot be omitted. Otherwise, the host machine may not be able to access "
 "the port inside the container."
 msgstr "``-H 0.0.0.0`` 也是必须指定的，否则在容器外无法连接到 Xinference 服务。"
 
-#: ../../source/getting_started/using_docker_image.rst:58
+#: ../../source/getting_started/using_docker_image.rst:59
 msgid ""
 "You can add multiple ``-e`` options to introduce multiple environment "
 "variables."
 msgstr "可以指定多个 ``-e`` 选项赋值多个环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:61
+#: ../../source/getting_started/using_docker_image.rst:62
 msgid ""
 "Certainly, if you prefer, you can also manually enter the docker "
 "container and start Xinference in any desired way."
 msgstr "当然，也可以运行容器后，进入容器内手动拉起 Xinference。"
 
-#: ../../source/getting_started/using_docker_image.rst:65
+#: ../../source/getting_started/using_docker_image.rst:66
 msgid ""
 "For multiple GPUs, make sure to set the shared memory size, for example: "
 "`docker run --shm-size=128g ...`"
 msgstr "对于多张 GPU，确保设置共享内存大小，例如：`docker run --shm-size=128g ...`"
 
-#: ../../source/getting_started/using_docker_image.rst:69
+#: ../../source/getting_started/using_docker_image.rst:70
 msgid "Mount your volume for loading and saving models"
 msgstr "挂载模型目录"
 
-#: ../../source/getting_started/using_docker_image.rst:70
+#: ../../source/getting_started/using_docker_image.rst:71
 msgid ""
 "The image does not contain any model files by default, and it downloads "
 "the models into the container. Typically, you would need to mount a "
@@ -186,7 +186,7 @@ msgstr ""
 "默认情况下，镜像中不包含任何模型文件，使用过程中会在容器内下载模型。如果需要使用已经下载好的模型，需要将宿主机的目录挂载到容器内。这种情况下，需要在运行容器时指定本地卷，并且为"
 " Xinference 配置环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:79
+#: ../../source/getting_started/using_docker_image.rst:80
 msgid ""
 "The principle behind the above command is to mount the specified "
 "directory from the host machine into the container, and then set the "
@@ -201,7 +201,7 @@ msgstr ""
 "环境变量指向容器内的该目录。这样，所有下载的模型文件将存储在您在主机上指定的目录中。您无需担心在 Docker "
 "容器停止时丢失这些文件，下次运行容器时，您可以直接使用现有的模型，无需重复下载。"
 
-#: ../../source/getting_started/using_docker_image.rst:83
+#: ../../source/getting_started/using_docker_image.rst:84
 msgid ""
 "If you downloaded the model using the default path on the host machine, "
 "and since the xinference cache directory stores the model using symbolic "
@@ -227,3 +227,9 @@ msgstr ""
 #~ "``nightly-main-cu128`` 。（Xinference 版本需要介于 "
 #~ "v1.8.1 和 v1.15.0）"
 
+#~ msgid ""
+#~ "Starting from **Xinference v2.0**, only "
+#~ "``-cu129`` and ``-cpu`` images are "
+#~ "officially provided."
+#~ msgstr ""
+

From 3e55efc3ad30363cc11bba43ac3001687af16aa5 Mon Sep 17 00:00:00 2001
From: OliverBryant <2713999266@qq.com>
Date: Thu, 29 Jan 2026 16:52:37 +0800
Subject: [PATCH 4/8] fix doc bug

---
 .../getting_started/using_docker_image.rst    |  9 +-
 .../getting_started/using_docker_image.po     | 84 ++++++++++---------
 .../zh_CN/LC_MESSAGES/models/virtualenv.po    |  6 +-
 doc/source/models/virtualenv.rst              |  2 +-
 4 files changed, 53 insertions(+), 48 deletions(-)

diff --git a/doc/source/getting_started/using_docker_image.rst b/doc/source/getting_started/using_docker_image.rst
index 162f5f2a91..8786ca9130 100644
--- a/doc/source/getting_started/using_docker_image.rst
+++ b/doc/source/getting_started/using_docker_image.rst
@@ -6,6 +6,9 @@ Xinference Docker Image
 
 Xinference provides official images for use on Dockerhub.
 
+.. versionchanged:: v2.0
+
+   Starting from **Xinference v2.0**, to use the CUDA version of the image, the minimum CUDA version must be **CUDA 12.9**.
 
 Prerequisites
 =============
@@ -20,16 +23,10 @@ Docker Image
 The official image of Xinference is available on DockerHub in the repository ``xprobe/xinference``.
 Available tags include:
 
-.. versionchanged:: v2.0
-
-   Starting from **Xinference v2.0**, only two image variants are provided:
-   the default (no suffix, **CUDA 12.9**) and the ``-cpu`` image.
-
 * ``nightly-main``: This image is built daily from the `GitHub main branch <https://github.com/xorbitsai/inference>`_ and generally does not guarantee stability.
 * ``v<release version>``: This image is built each time a Xinference release version is published, and it is typically more stable.
 * ``latest``: This image is built with the latest Xinference release version.
 * For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``.
-* For CUDA 12.9, no suffix, e.g. ``nightly-main``. (Xinference version should be v2.0 at least)
 
 
 Dockerfile for custom build
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
index bab6e0f1e6..261d1bf634 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/getting_started/using_docker_image.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-01-29 14:34+0800\n"
+"POT-Creation-Date: 2026-01-29 16:49+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -28,23 +28,30 @@ msgid "Xinference provides official images for use on Dockerhub."
 msgstr "Xinference 在 Dockerhub 和 阿里云容器镜像服务 中上传了官方镜像。"
 
 #: ../../source/getting_started/using_docker_image.rst:11
+msgid ""
+"Starting from **Xinference v2.0**, to use the CUDA version of the image, "
+"the minimum CUDA version must be **CUDA 12.9**."
+msgstr ""
+"从 **Xinference v2.0** 开始，如果要使用cuda版本的镜像，cuda版本最低要达到 **CUDA 12.9** 。"
+
+#: ../../source/getting_started/using_docker_image.rst:14
 msgid "Prerequisites"
 msgstr "准备工作"
 
-#: ../../source/getting_started/using_docker_image.rst:12
+#: ../../source/getting_started/using_docker_image.rst:15
 msgid ""
 "The image can only run in an environment with GPUs and CUDA installed, "
 "because Xinference in the image relies on Nvidia GPUs for acceleration."
 msgstr "Xinference 使用 GPU 加速推理，该镜像需要在有 GPU 显卡并且安装 CUDA 的机器上运行。"
 
-#: ../../source/getting_started/using_docker_image.rst:13
+#: ../../source/getting_started/using_docker_image.rst:16
 msgid ""
 "CUDA must be successfully installed on the host machine. This can be "
 "determined by whether you can successfully execute the ``nvidia-smi`` "
 "command."
 msgstr "保证 CUDA 在机器上正确安装。可以使用 ``nvidia-smi`` 检查是否正确运行。"
 
-#: ../../source/getting_started/using_docker_image.rst:14
+#: ../../source/getting_started/using_docker_image.rst:17
 msgid ""
 "For CUDA version >= 12.9, CUDA version in the docker image is ``12.9``, "
 "and the CUDA version on the host machine should be ``12.9`` or above, and"
@@ -53,7 +60,7 @@ msgstr ""
 "对于 CUDA 版本 >= 12.9，Docker 镜像中使用的 CUDA 版本为 ``12.9``。宿主机上的 CUDA 版本需为 "
 "``12.9`` 或以上，同时 NVIDIA 驱动版本需为 ``575`` 或以上。"
 
-#: ../../source/getting_started/using_docker_image.rst:15
+#: ../../source/getting_started/using_docker_image.rst:18
 msgid ""
 "Ensure `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter"
 "/cloud-native/container-toolkit/latest/install-guide.html>`_ installed."
@@ -61,58 +68,44 @@ msgstr ""
 "请确保已安装 `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter"
 "/cloud-native/container-toolkit/latest/install-guide.html>`_ 。"
 
-#: ../../source/getting_started/using_docker_image.rst:19
+#: ../../source/getting_started/using_docker_image.rst:22
 msgid "Docker Image"
 msgstr "Docker 镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:20
+#: ../../source/getting_started/using_docker_image.rst:23
 msgid ""
 "The official image of Xinference is available on DockerHub in the "
 "repository ``xprobe/xinference``. Available tags include:"
 msgstr "Xinference 官方镜像已发布在 DockerHub 上的 ``xprobe/xinference`` 仓库中。当前可用的标签包括："
 
-#: ../../source/getting_started/using_docker_image.rst:25
-msgid ""
-"Starting from **Xinference v2.0**, only two image variants are provided: "
-"the default (no suffix, **CUDA 12.9**) and the ``-cpu`` image."
-msgstr ""
-"从 **Xinference v2.0** 开始，仅提供两种镜像变体：默认镜像（无后缀， **CUDA 12.9** ）和 ``-cpu`` 镜像。"
-
-#: ../../source/getting_started/using_docker_image.rst:28
+#: ../../source/getting_started/using_docker_image.rst:26
 msgid ""
 "``nightly-main``: This image is built daily from the `GitHub main branch "
 "<https://github.com/xorbitsai/inference>`_ and generally does not "
 "guarantee stability."
 msgstr "``nightly-main``: 这个镜像会每天从 GitHub main 分支更新制作，不保证稳定可靠。"
 
-#: ../../source/getting_started/using_docker_image.rst:29
+#: ../../source/getting_started/using_docker_image.rst:27
 msgid ""
 "``v<release version>``: This image is built each time a Xinference "
 "release version is published, and it is typically more stable."
 msgstr "``v<release version>``: 这个镜像会在 Xinference 每次发布的时候制作，通常可以认为是稳定可靠的。"
 
-#: ../../source/getting_started/using_docker_image.rst:30
+#: ../../source/getting_started/using_docker_image.rst:28
 msgid ""
 "``latest``: This image is built with the latest Xinference release "
 "version."
 msgstr "``latest``: 这个镜像会在 Xinference 发布时指向最新的发布版本"
 
-#: ../../source/getting_started/using_docker_image.rst:31
+#: ../../source/getting_started/using_docker_image.rst:29
 msgid "For CPU version, add ``-cpu`` suffix, e.g. ``nightly-main-cpu``."
 msgstr "对于 CPU 版本，增加 ``-cpu`` 后缀，如 ``nightly-main-cpu``。"
 
-#: ../../source/getting_started/using_docker_image.rst:32
-msgid ""
-"For CUDA 12.9, no suffix, e.g. ``nightly-main``. (Xinference version "
-"should be v2.0 at least)"
-msgstr ""
-"对于 CUDA 12.9，不带后缀，例如 ``nightly-main`` 。（Xinference 版本应至少为 v2.0）"
-
-#: ../../source/getting_started/using_docker_image.rst:36
+#: ../../source/getting_started/using_docker_image.rst:33
 msgid "Dockerfile for custom build"
 msgstr "自定义镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:37
+#: ../../source/getting_started/using_docker_image.rst:34
 msgid ""
 "If you need to build the Xinference image according to your own "
 "requirements, the source code for the Dockerfile is located at "
@@ -125,11 +118,11 @@ msgstr ""
 "<https://github.com/xorbitsai/inference/tree/main/xinference/deploy/docker/Dockerfile>`_"
 " 。请确保使用 Dockerfile 制作镜像时在 Xinference 项目的根目录下。比如："
 
-#: ../../source/getting_started/using_docker_image.rst:48
+#: ../../source/getting_started/using_docker_image.rst:45
 msgid "Image usage"
 msgstr "使用镜像"
 
-#: ../../source/getting_started/using_docker_image.rst:49
+#: ../../source/getting_started/using_docker_image.rst:46
 msgid ""
 "You can start Xinference in the container like this, simultaneously "
 "mapping port 9997 in the container to port 9998 on the host, enabling "
@@ -138,43 +131,43 @@ msgstr ""
 "你可以使用如下方式在容器内启动 Xinference，同时将 9997 端口映射到宿主机的 9998 端口，并且指定日志级别为 "
 "DEBUG，也可以指定需要的环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:57
+#: ../../source/getting_started/using_docker_image.rst:54
 msgid ""
 "The option ``--gpus`` is essential and cannot be omitted, because as "
 "mentioned earlier, the image requires the host machine to have a GPU. "
 "Otherwise, errors will occur."
 msgstr "``--gpus`` 必须指定，正如前文描述，镜像必须运行在有 GPU 的机器上，否则会出现错误。"
 
-#: ../../source/getting_started/using_docker_image.rst:58
+#: ../../source/getting_started/using_docker_image.rst:55
 msgid ""
 "The ``-H 0.0.0.0`` parameter after the ``xinference-local`` command "
 "cannot be omitted. Otherwise, the host machine may not be able to access "
 "the port inside the container."
 msgstr "``-H 0.0.0.0`` 也是必须指定的，否则在容器外无法连接到 Xinference 服务。"
 
-#: ../../source/getting_started/using_docker_image.rst:59
+#: ../../source/getting_started/using_docker_image.rst:56
 msgid ""
 "You can add multiple ``-e`` options to introduce multiple environment "
 "variables."
 msgstr "可以指定多个 ``-e`` 选项赋值多个环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:62
+#: ../../source/getting_started/using_docker_image.rst:59
 msgid ""
 "Certainly, if you prefer, you can also manually enter the docker "
 "container and start Xinference in any desired way."
 msgstr "当然，也可以运行容器后，进入容器内手动拉起 Xinference。"
 
-#: ../../source/getting_started/using_docker_image.rst:66
+#: ../../source/getting_started/using_docker_image.rst:63
 msgid ""
 "For multiple GPUs, make sure to set the shared memory size, for example: "
 "`docker run --shm-size=128g ...`"
 msgstr "对于多张 GPU，确保设置共享内存大小，例如：`docker run --shm-size=128g ...`"
 
-#: ../../source/getting_started/using_docker_image.rst:70
+#: ../../source/getting_started/using_docker_image.rst:67
 msgid "Mount your volume for loading and saving models"
 msgstr "挂载模型目录"
 
-#: ../../source/getting_started/using_docker_image.rst:71
+#: ../../source/getting_started/using_docker_image.rst:68
 msgid ""
 "The image does not contain any model files by default, and it downloads "
 "the models into the container. Typically, you would need to mount a "
@@ -186,7 +179,7 @@ msgstr ""
 "默认情况下，镜像中不包含任何模型文件，使用过程中会在容器内下载模型。如果需要使用已经下载好的模型，需要将宿主机的目录挂载到容器内。这种情况下，需要在运行容器时指定本地卷，并且为"
 " Xinference 配置环境变量。"
 
-#: ../../source/getting_started/using_docker_image.rst:80
+#: ../../source/getting_started/using_docker_image.rst:77
 msgid ""
 "The principle behind the above command is to mount the specified "
 "directory from the host machine into the container, and then set the "
@@ -201,7 +194,7 @@ msgstr ""
 "环境变量指向容器内的该目录。这样，所有下载的模型文件将存储在您在主机上指定的目录中。您无需担心在 Docker "
 "容器停止时丢失这些文件，下次运行容器时，您可以直接使用现有的模型，无需重复下载。"
 
-#: ../../source/getting_started/using_docker_image.rst:84
+#: ../../source/getting_started/using_docker_image.rst:81
 msgid ""
 "If you downloaded the model using the default path on the host machine, "
 "and since the xinference cache directory stores the model using symbolic "
@@ -233,3 +226,18 @@ msgstr ""
 #~ "officially provided."
 #~ msgstr ""
 
+#~ msgid ""
+#~ "Starting from **Xinference v2.0**, only "
+#~ "two image variants are provided: the "
+#~ "default (no suffix, **CUDA 12.9**) and"
+#~ " the ``-cpu`` image."
+#~ msgstr ""
+#~ "从 **Xinference v2.0** 开始，仅提供两种镜像变体：默认镜像（无后缀， "
+#~ "**CUDA 12.9** ）和 ``-cpu`` 镜像。"
+
+#~ msgid ""
+#~ "For CUDA 12.9, no suffix, e.g. "
+#~ "``nightly-main``. (Xinference version should "
+#~ "be v2.0 at least)"
+#~ msgstr "对于 CUDA 12.9，不带后缀，例如 ``nightly-main`` 。（Xinference 版本应至少为 v2.0）"
+
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
index 0cf57e47df..8c49bab290 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
@@ -216,8 +216,8 @@ msgstr ""
 
 #: ../../source/models/virtualenv.rst:133
 msgid ""
-"This feature is enabled by default in ``v2.0.0``. To disable it, set "
-"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0``."
+"This feature is enabled by default in ``v2.0.0`` . To disable it, set "
+"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` ."
 msgstr ""
 "此功能在``v2.0.0``版本中默认启用。若需禁用，请设置"
 "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` 。"
@@ -230,7 +230,7 @@ msgstr "优势"
 msgid ""
 "Avoid redundant installations of large dependencies (e.g., ``torch`` + "
 "``CUDA``)."
-msgstr "避免重复安装大型依赖（例如 ``torch`` + ``CUDA``）。"
+msgstr "避免重复安装大型依赖（例如 ``torch`` + ``CUDA`` ）。"
 
 #: ../../source/models/virtualenv.rst:140
 msgid "Speed up virtual environment creation."
diff --git a/doc/source/models/virtualenv.rst b/doc/source/models/virtualenv.rst
index d129505d53..0db7d13f31 100644
--- a/doc/source/models/virtualenv.rst
+++ b/doc/source/models/virtualenv.rst
@@ -113,7 +113,7 @@ By default, the model’s virtual environment is stored under path:
 * Since v2.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / virtualenv / v4 / {model_name} / {model_engine} / {python_version}
 
 Skip Installed Libraries
-########################
+~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. _skip_installed_libraries:
 

From c5759029daf66c4d84937e750128b0e5c05a1c42 Mon Sep 17 00:00:00 2001
From: OliverBryant <2713999266@qq.com>
Date: Thu, 29 Jan 2026 17:11:55 +0800
Subject: [PATCH 5/8] fix doc bug

---
 doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
index 8c49bab290..381a6d79dc 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
@@ -21,7 +21,7 @@ msgstr ""
 
 #: ../../source/models/virtualenv.rst:6
 msgid "Model Virtual Environments"
-msgstr "模型虚拟空间"
+msgstr "模型虚拟环境"
 
 #: ../../source/models/virtualenv.rst:11
 msgid "Background"
@@ -49,7 +49,7 @@ msgstr "解决方案"
 msgid ""
 "To address this issue, we have introduced the **Model Virtual "
 "Environment** feature."
-msgstr "为了解决这个问题，我们引入了 **模型虚拟空间** 功能。"
+msgstr "为了解决这个问题，我们引入了 **模型虚拟环境** 功能。"
 
 #: ../../source/models/virtualenv.rst:23
 msgid "Install requirements for this functionality via"
@@ -219,7 +219,7 @@ msgid ""
 "This feature is enabled by default in ``v2.0.0`` . To disable it, set "
 "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` ."
 msgstr ""
-"此功能在``v2.0.0``版本中默认启用。若需禁用，请设置"
+"此功能在 ``v2.0.0`` 版本中默认启用。若需禁用，请设置"
 "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` 。"
 
 #: ../../source/models/virtualenv.rst:137

From 6df227fafd358b21febe246f99115f5f2ab2847f Mon Sep 17 00:00:00 2001
From: OliverBryant <2713999266@qq.com>
Date: Thu, 29 Jan 2026 17:41:53 +0800
Subject: [PATCH 6/8] fix doc bug

---
 .../zh_CN/LC_MESSAGES/models/virtualenv.po    | 117 +++++++++---------
 doc/source/models/virtualenv.rst              |  15 ++-
 2 files changed, 65 insertions(+), 67 deletions(-)

diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
index 381a6d79dc..2dcf90c4b1 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-01-29 11:03+0800\n"
+"POT-Creation-Date: 2026-01-29 17:40+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -61,8 +61,8 @@ msgid ""
 "``XINFERENCE_ENABLE_VIRTUAL_ENV=1``."
 msgstr "通过设置环境变量 ``XINFERENCE_ENABLE_VIRTUAL_ENV=1`` 启用该功能。"
 
-#: ../../source/models/virtualenv.rst:34 ../../source/models/virtualenv.rst:207
-#: ../../source/models/virtualenv.rst:223
+#: ../../source/models/virtualenv.rst:34 ../../source/models/virtualenv.rst:206
+#: ../../source/models/virtualenv.rst:222
 msgid "Example usage:"
 msgstr "使用示例："
 
@@ -216,79 +216,79 @@ msgstr ""
 
 #: ../../source/models/virtualenv.rst:133
 msgid ""
-"This feature is enabled by default in ``v2.0.0`` . To disable it, set "
-"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` ."
+"This feature is enabled by default in ``v2.0``. To disable it, set "
+"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0``."
 msgstr ""
-"此功能在 ``v2.0.0`` 版本中默认启用。若需禁用，请设置"
-"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` 。"
+"此功能在 ``v2.0`` "
+"版本中默认启用。若需禁用，请设置 ``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` 。"
 
-#: ../../source/models/virtualenv.rst:137
+#: ../../source/models/virtualenv.rst:136
 msgid "Advantages"
 msgstr "优势"
 
-#: ../../source/models/virtualenv.rst:139
+#: ../../source/models/virtualenv.rst:138
 msgid ""
 "Avoid redundant installations of large dependencies (e.g., ``torch`` + "
 "``CUDA``)."
 msgstr "避免重复安装大型依赖（例如 ``torch`` + ``CUDA`` ）。"
 
-#: ../../source/models/virtualenv.rst:140
+#: ../../source/models/virtualenv.rst:139
 msgid "Speed up virtual environment creation."
 msgstr "加快虚拟环境创建速度。"
 
-#: ../../source/models/virtualenv.rst:141
+#: ../../source/models/virtualenv.rst:140
 msgid "Reduce disk usage."
 msgstr "减少磁盘空间占用。"
 
-#: ../../source/models/virtualenv.rst:144
+#: ../../source/models/virtualenv.rst:143
 msgid "Usage"
 msgstr "使用"
 
-#: ../../source/models/virtualenv.rst:156
+#: ../../source/models/virtualenv.rst:155
 msgid "Performance Comparison"
 msgstr "性能对比"
 
-#: ../../source/models/virtualenv.rst:158
+#: ../../source/models/virtualenv.rst:157
 msgid "Using the ``CosyVoice 0.5B`` model as an example:"
 msgstr "以 ``CosyVoice 0.5B`` 模型为例："
 
-#: ../../source/models/virtualenv.rst:160
+#: ../../source/models/virtualenv.rst:159
 msgid "**Without this feature enabled**::"
 msgstr "**未开启该功能时**::"
 
-#: ../../source/models/virtualenv.rst:171
+#: ../../source/models/virtualenv.rst:170
 msgid "**With this feature enabled**::"
 msgstr "**开启该功能后**::"
 
-#: ../../source/models/virtualenv.rst:186
+#: ../../source/models/virtualenv.rst:185
 msgid "Model Launching: Toggle Virtual Environments and Customize Dependencies"
 msgstr "模型加载：开关虚拟环境并自定义依赖"
 
-#: ../../source/models/virtualenv.rst:190
+#: ../../source/models/virtualenv.rst:189
 msgid ""
 "Starting from v1.8.1, we support toggling the virtual environment for "
 "individual model launching, as well as overriding the model's default "
 "settings with custom package dependencies."
 msgstr "从 v1.8.1 开始，我们支持对单个模型加载开关虚拟环境，并用自定义包依赖覆盖模型的默认设置。"
 
-#: ../../source/models/virtualenv.rst:194
+#: ../../source/models/virtualenv.rst:193
 msgid "Toggle Virtual Environment"
 msgstr "开关模型虚拟空间"
 
-#: ../../source/models/virtualenv.rst:196
+#: ../../source/models/virtualenv.rst:195
 msgid ""
 "When loading a model, you can specify whether to enable the model's "
 "virtual environment. If not specified, the setting will follow the "
 "environment variable configuration."
 msgstr "加载模型时，可以指定是否启用模型的虚拟环境。如果未指定，则默认遵循环境变量的配置。"
 
-#: ../../source/models/virtualenv.rst:199
+#: ../../source/models/virtualenv.rst:198
 msgid ""
 "For the Web UI, this can be toggled on or off through the optional "
 "settings switch."
 msgstr "在 Web UI 中，可以通过可选设置开关打开或关闭该功能。"
 
-#: ../../source/models/virtualenv.rst:205
+#: ../../source/models/virtualenv.rst:204
 msgid ""
 "For command-line loading, use the ``--enable-virtual-env`` option to "
 "enable the virtual environment, or ``--disable-virtual-env`` to disable "
@@ -297,11 +297,11 @@ msgstr ""
 "命令行加载时，使用 ``--enable-virtual-env`` 选项启用虚拟环境，使用 ``--disable-virtual-env`` "
 "选项禁用虚拟环境。"
 
-#: ../../source/models/virtualenv.rst:214
+#: ../../source/models/virtualenv.rst:213
 msgid "Set Virtual Environment Package Dependencies"
 msgstr "设置虚拟环境包依赖"
 
-#: ../../source/models/virtualenv.rst:216
+#: ../../source/models/virtualenv.rst:215
 msgid ""
 "For supported models, Xinference has already defined the package "
 "dependencies and version requirements within the virtual environment. "
@@ -309,25 +309,25 @@ msgid ""
 " dependencies, you can manually provide them during model loading."
 msgstr "对于支持的模型，Xinference 已经在虚拟环境中定义了包依赖和版本要求。但如果需要指定特定版本或安装额外依赖，可以在加载模型时手动提供。"
 
-#: ../../source/models/virtualenv.rst:219
+#: ../../source/models/virtualenv.rst:218
 msgid ""
 "In the Web UI, you can add custom dependencies by clicking the plus icon "
 "in the same location as the virtual environment toggle."
 msgstr "在 Web UI 中，可以在虚拟环境开关同一位置点击加号图标来添加自定义依赖。"
 
-#: ../../source/models/virtualenv.rst:221
+#: ../../source/models/virtualenv.rst:220
 msgid ""
 "For the command line, use ``--virtual-env-package`` or ``-vp`` to specify"
 " a single package version."
 msgstr "命令行中，使用 ``--virtual-env-package`` 或 ``-vp`` 来指定单个包版本。"
 
-#: ../../source/models/virtualenv.rst:229
+#: ../../source/models/virtualenv.rst:228
 msgid ""
 "In addition to the standard way of specifying package dependencies, such "
 "as ``transformers==xxx``, Xinference also supports some extended syntax."
 msgstr "除了常规的包依赖指定方式（如 ``transformers==xxx``），Xinference 还支持一些扩展语法。"
 
-#: ../../source/models/virtualenv.rst:231
+#: ../../source/models/virtualenv.rst:230
 msgid ""
 "``#system_xxx#``: Using the same version as the system site packages, "
 "such as ``#system_numpy#``, ensures that the installed package matches "
@@ -337,21 +337,21 @@ msgstr ""
 "``#system_xxx#``：使用与系统 site packages 相同的版本，例如 "
 "``#system_numpy#``，确保安装的包版本与系统 site packages 中的 numpy 版本一致，防止依赖冲突。"
 
-#: ../../source/models/virtualenv.rst:235
+#: ../../source/models/virtualenv.rst:234
 msgid "ModelHub JSON for Xinference Models"
 msgstr "ModelHub JSON 格式（适用于 Xinference 模型）"
 
-#: ../../source/models/virtualenv.rst:237
+#: ../../source/models/virtualenv.rst:236
 msgid ""
 "If you plan to add a model to a model hub for Xinference, define a "
 "``virtualenv`` block in the model JSON. Starting from v2.0 (v4 flow), "
 "**engine-aware markers are recommended** so one JSON can cover multiple "
 "engines."
 msgstr ""
-"若计划将模型添加至Xinference的Model Hub，请在模型JSON中定义一个``virtualenv``块。"
-"自v2.0（v4流程）起， **建议使用引擎感知标记** ，以便单个JSON文件覆盖多个引擎。"
+"若计划将模型添加至Xinference的Model Hub，请在模型JSON中定义一个``virtualenv``块。自v2.0（v4流程）起， "
+"**建议使用引擎感知标记** ，以便单个JSON文件覆盖多个引擎。"
 
-#: ../../source/models/virtualenv.rst:241
+#: ../../source/models/virtualenv.rst:240
 msgid ""
 "Important rule: If a new model supports a specific engine, you **must** "
 "include at least one package entry for that engine in "
@@ -363,17 +363,17 @@ msgstr ""
 "中至少包含该引擎的一个包条目，并附加标记（例如 ``#engine# == \"vllm\"`` "
 "）。当虚拟环境启用时，引擎可用性检查依赖这些标记进行验证。"
 
-#: ../../source/models/virtualenv.rst:268
+#: ../../source/models/virtualenv.rst:267
 msgid "``packages`` (required): list of pip requirement strings or markers."
 msgstr " ``packages`` （必填）：pip 要求字符串或标记的列表。"
 
-#: ../../source/models/virtualenv.rst:269
+#: ../../source/models/virtualenv.rst:268
 msgid ""
 "``inherit_pip_config`` (default ``true``): inherit system pip "
 "configuration if present."
 msgstr " ``inherit_pip_config`` （默认值为 ``true`` ）：若存在系统 pip 配置文件，则继承其设置。"
 
-#: ../../source/models/virtualenv.rst:270
+#: ../../source/models/virtualenv.rst:269
 msgid ""
 "``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host``: "
 "pip index and mirror controls."
@@ -381,49 +381,49 @@ msgstr ""
 " ``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host`` "
 ": pip 索引和镜像控制。"
 
-#: ../../source/models/virtualenv.rst:272
+#: ../../source/models/virtualenv.rst:271
 msgid ""
 "``index_strategy``: passed through to the virtualenv installer (used by "
 "some engines)."
 msgstr " ``index_strategy`` ：传递给虚拟环境安装程序（由某些引擎使用）。"
 
-#: ../../source/models/virtualenv.rst:273
+#: ../../source/models/virtualenv.rst:272
 msgid "``no_build_isolation``: pip build isolation switch for tricky builds."
 msgstr " ``no_build_isolation`` ：用于处理复杂构建的pip构建隔离开关。"
 
-#: ../../source/models/virtualenv.rst:278
+#: ../../source/models/virtualenv.rst:277
 msgid "Use wrapped placeholders to inject engine defaults:"
 msgstr "使用包裹的占位符注入引擎默认值："
 
-#: ../../source/models/virtualenv.rst:280
+#: ../../source/models/virtualenv.rst:279
 msgid "``#vllm_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:281
+#: ../../source/models/virtualenv.rst:280
 msgid "``#sglang_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:282
+#: ../../source/models/virtualenv.rst:281
 msgid "``#mlx_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:283
+#: ../../source/models/virtualenv.rst:282
 msgid "``#transformers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:284
+#: ../../source/models/virtualenv.rst:283
 msgid "``#llama_cpp_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:285
+#: ../../source/models/virtualenv.rst:284
 msgid "``#diffusers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:286
+#: ../../source/models/virtualenv.rst:285
 msgid "``#sentence_transformers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:291
+#: ../../source/models/virtualenv.rst:290
 msgid ""
 "Markers use ``#engine#`` or ``#model_engine#`` comparisons (case-"
 "sensitive). Engine values are passed in lowercase internally, so prefer "
@@ -434,22 +434,18 @@ msgstr ""
 "进行比较（区分大小写）。引擎值在内部以小写形式传递，因此建议使用小写值，例如 ``#engine# == \"vllm\"`` 或 "
 "``#engine# == \"transformers\"`` 。"
 
-#: ../../source/models/virtualenv.rst:299
-msgid "Manage Virtual Enviroments"
-msgstr "虚拟环境管理"
-
-#: ../../source/models/virtualenv.rst:303
+#: ../../source/models/virtualenv.rst:302
 msgid ""
 "Xinference provides comprehensive virtual environment management for "
 "model dependencies, allowing you to create isolated Python environments "
 "for each model with specific package requirements."
 msgstr "Xinference 提供全面的虚拟环境管理功能，允许您为每个模型创建独立的 Python 环境，满足特定的包依赖需求。"
 
-#: ../../source/models/virtualenv.rst:315
+#: ../../source/models/virtualenv.rst:314
 msgid "Key Features"
 msgstr "核心功能"
 
-#: ../../source/models/virtualenv.rst:317
+#: ../../source/models/virtualenv.rst:316
 msgid ""
 "**Multiple Python Version Support**: Each model can have virtual "
 "environments with different Python versions (e.g., Python 3.10.18, "
@@ -458,23 +454,23 @@ msgstr ""
 "**多 Python 版本支持** : 每个模型可以拥有不同 Python 版本的虚拟环境（例如 Python "
 "3.10.18、3.11.5），实现与各种模型要求的兼容性。"
 
-#: ../../source/models/virtualenv.rst:322
+#: ../../source/models/virtualenv.rst:321
 msgid ""
 "**Isolated Dependencies**: Each virtual environment contains its own set "
 "of packages, preventing conflicts between different models' requirements."
 msgstr "**依赖隔离** : 每个虚拟环境包含自己独立的包集合，防止不同模型之间的依赖冲突。"
 
-#: ../../source/models/virtualenv.rst:327
+#: ../../source/models/virtualenv.rst:326
 msgid "Management Operations"
 msgstr "管理操作"
 
-#: ../../source/models/virtualenv.rst:329
+#: ../../source/models/virtualenv.rst:328
 msgid ""
 "**Listing Virtual Environments**: View all virtual environments across "
 "your cluster, filtered by model name or worker IP address."
 msgstr "**列出虚拟环境** : 查看集群中的所有虚拟环境，支持按模型名称或工作节点 IP 地址过滤。"
 
-#: ../../source/models/virtualenv.rst:333
+#: ../../source/models/virtualenv.rst:332
 msgid ""
 "**Creating Environments**: Automatically created when launching models "
 "with enable_virtual_env=true. The system detects your current Python "
@@ -483,10 +479,13 @@ msgstr ""
 "**创建环境** : 当使用 enable_virtual_env=true 启动模型时自动创建。系统会检测当前的 Python "
 "版本并创建包含所需包的独立环境。"
 
-#: ../../source/models/virtualenv.rst:338
+#: ../../source/models/virtualenv.rst:337
 msgid ""
 "**Removing Environments**: Delete specific virtual environments by model "
 "name and optionally Python version, or remove all environments for a "
 "model."
 msgstr "**删除环境** : 可按模型名称和可选的 Python 版本删除特定虚拟环境，或删除模型的所有环境。"
 
+#~ msgid "Manage Virtual Enviroments"
+#~ msgstr "虚拟环境管理"
+
diff --git a/doc/source/models/virtualenv.rst b/doc/source/models/virtualenv.rst
index 0db7d13f31..3f18c2d564 100644
--- a/doc/source/models/virtualenv.rst
+++ b/doc/source/models/virtualenv.rst
@@ -113,7 +113,7 @@ By default, the model’s virtual environment is stored under path:
 * Since v2.0: :ref:`XINFERENCE_HOME <environments_xinference_home>` / virtualenv / v4 / {model_name} / {model_engine} / {python_version}
 
 Skip Installed Libraries
-~~~~~~~~~~~~~~~~~~~~~~~~
+########################
 
 .. _skip_installed_libraries:
 
@@ -128,10 +128,9 @@ This ensures better isolation from system packages but can result in redundant i
 Starting from ``v1.8.1``, an **experimental feature** is available:
 by setting the environment variable ``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1``, ``uv`` will **skip packages already available in system site-packages**.
 
-.. versionchanged:: v2.0.0
+.. versionchanged:: v2.0
 
-    This feature is enabled by default in ``v2.0.0``. To disable it, set
-    ``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0``.
+    This feature is enabled by default in ``v2.0``. To disable it, set ``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0``.
 
 Advantages
 ~~~~~~~~~~
@@ -183,7 +182,7 @@ Using the ``CosyVoice 0.5B`` model as an example:
 .. _model_launching_virtualenv:
 
 Model Launching: Toggle Virtual Environments and Customize Dependencies
------------------------------------------------------------------------
+#######################################################################
 
 .. versionadded:: v1.8.1
 
@@ -232,7 +231,7 @@ In addition to the standard way of specifying package dependencies, such as ``tr
   ensures that the installed package matches the system site package version of numpy. This helps prevent dependency conflicts.
 
 ModelHub JSON for Xinference Models
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+###################################
 
 If you plan to add a model to a model hub for Xinference, define a ``virtualenv`` block
 in the model JSON. Starting from v2.0 (v4 flow), **engine-aware markers are recommended**
@@ -312,7 +311,7 @@ allowing you to create isolated Python environments for each model with specific
     <img class="align-center" alt="actor" src="../_static/manage_virtual_envs2.png" style="background-color: transparent", width="95%">
 
 Key Features
-~~~~~~~~~~
+############
 
 **Multiple Python Version Support**:
 Each model can have virtual environments
@@ -324,7 +323,7 @@ Each virtual environment contains its own set of packages,
 preventing conflicts between different models' requirements.
 
 Management Operations
-~~~~~
+#####################
 
 **Listing Virtual Environments**:
 View all virtual environments across your cluster,

From c7fe610f9c3ffbc02135516d8a56de7a2e8fa698 Mon Sep 17 00:00:00 2001
From: OliverBryant <2713999266@qq.com>
Date: Thu, 29 Jan 2026 18:52:56 +0800
Subject: [PATCH 7/8] fix doc error

---
 .../zh_CN/LC_MESSAGES/models/virtualenv.po    | 148 +++++++++---------
 doc/source/models/virtualenv.rst              |  93 +++++------
 2 files changed, 123 insertions(+), 118 deletions(-)

diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
index 2dcf90c4b1..907443edb8 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/virtualenv.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-01-29 17:40+0800\n"
+"POT-Creation-Date: 2026-01-29 18:51+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -219,8 +219,8 @@ msgid ""
 "This feature is enabled by default in ``v2.0``. To disable it, set "
 "``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0``."
 msgstr ""
-"此功能在 ``v2.0`` "
-"版本中默认启用。若需禁用，请设置 ``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` 。"
+"此功能在 ``v2.0`` 版本中默认启用。若需禁用，请设置 "
+"``XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0`` 。"
 
 #: ../../source/models/virtualenv.rst:136
 msgid "Advantages"
@@ -337,11 +337,67 @@ msgstr ""
 "``#system_xxx#``：使用与系统 site packages 相同的版本，例如 "
 "``#system_numpy#``，确保安装的包版本与系统 site packages 中的 numpy 版本一致，防止依赖冲突。"
 
-#: ../../source/models/virtualenv.rst:234
+#: ../../source/models/virtualenv.rst:237
+msgid "Manage Virtual Enviroments"
+msgstr "虚拟环境管理"
+
+#: ../../source/models/virtualenv.rst:241
+msgid ""
+"Xinference provides comprehensive virtual environment management for "
+"model dependencies, allowing you to create isolated Python environments "
+"for each model with specific package requirements."
+msgstr "Xinference 提供全面的虚拟环境管理功能，允许您为每个模型创建独立的 Python 环境，满足特定的包依赖需求。"
+
+#: ../../source/models/virtualenv.rst:253
+msgid "Key Features"
+msgstr "核心功能"
+
+#: ../../source/models/virtualenv.rst:255
+msgid ""
+"**Multiple Python Version Support**: Each model can have virtual "
+"environments with different Python versions (e.g., Python 3.10.18, "
+"3.11.5), enabling compatibility with various model requirements."
+msgstr ""
+"**多 Python 版本支持** : 每个模型可以拥有不同 Python 版本的虚拟环境（例如 Python "
+"3.10.18、3.11.5），实现与各种模型要求的兼容性。"
+
+#: ../../source/models/virtualenv.rst:260
+msgid ""
+"**Isolated Dependencies**: Each virtual environment contains its own set "
+"of packages, preventing conflicts between different models' requirements."
+msgstr "**依赖隔离** : 每个虚拟环境包含自己独立的包集合，防止不同模型之间的依赖冲突。"
+
+#: ../../source/models/virtualenv.rst:265
+msgid "Management Operations"
+msgstr "管理操作"
+
+#: ../../source/models/virtualenv.rst:267
+msgid ""
+"**Listing Virtual Environments**: View all virtual environments across "
+"your cluster, filtered by model name or worker IP address."
+msgstr "**列出虚拟环境** : 查看集群中的所有虚拟环境，支持按模型名称或工作节点 IP 地址过滤。"
+
+#: ../../source/models/virtualenv.rst:271
+msgid ""
+"**Creating Environments**: Automatically created when launching models "
+"with enable_virtual_env=true. The system detects your current Python "
+"version and creates an isolated environment with the required packages."
+msgstr ""
+"**创建环境** : 当使用 enable_virtual_env=true 启动模型时自动创建。系统会检测当前的 Python "
+"版本并创建包含所需包的独立环境。"
+
+#: ../../source/models/virtualenv.rst:276
+msgid ""
+"**Removing Environments**: Delete specific virtual environments by model "
+"name and optionally Python version, or remove all environments for a "
+"model."
+msgstr "**删除环境** : 可按模型名称和可选的 Python 版本删除特定虚拟环境，或删除模型的所有环境。"
+
+#: ../../source/models/virtualenv.rst:281
 msgid "ModelHub JSON for Xinference Models"
 msgstr "ModelHub JSON 格式（适用于 Xinference 模型）"
 
-#: ../../source/models/virtualenv.rst:236
+#: ../../source/models/virtualenv.rst:283
 msgid ""
 "If you plan to add a model to a model hub for Xinference, define a "
 "``virtualenv`` block in the model JSON. Starting from v2.0 (v4 flow), "
@@ -351,7 +407,7 @@ msgstr ""
 "若计划将模型添加至Xinference的Model Hub，请在模型JSON中定义一个``virtualenv``块。自v2.0（v4流程）起， "
 "**建议使用引擎感知标记** ，以便单个JSON文件覆盖多个引擎。"
 
-#: ../../source/models/virtualenv.rst:240
+#: ../../source/models/virtualenv.rst:287
 msgid ""
 "Important rule: If a new model supports a specific engine, you **must** "
 "include at least one package entry for that engine in "
@@ -363,17 +419,17 @@ msgstr ""
 "中至少包含该引擎的一个包条目，并附加标记（例如 ``#engine# == \"vllm\"`` "
 "）。当虚拟环境启用时，引擎可用性检查依赖这些标记进行验证。"
 
-#: ../../source/models/virtualenv.rst:267
+#: ../../source/models/virtualenv.rst:314
 msgid "``packages`` (required): list of pip requirement strings or markers."
 msgstr " ``packages`` （必填）：pip 要求字符串或标记的列表。"
 
-#: ../../source/models/virtualenv.rst:268
+#: ../../source/models/virtualenv.rst:315
 msgid ""
 "``inherit_pip_config`` (default ``true``): inherit system pip "
 "configuration if present."
 msgstr " ``inherit_pip_config`` （默认值为 ``true`` ）：若存在系统 pip 配置文件，则继承其设置。"
 
-#: ../../source/models/virtualenv.rst:269
+#: ../../source/models/virtualenv.rst:316
 msgid ""
 "``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host``: "
 "pip index and mirror controls."
@@ -381,49 +437,49 @@ msgstr ""
 " ``index_url`` / ``extra_index_url`` / ``find_links`` / ``trusted_host`` "
 ": pip 索引和镜像控制。"
 
-#: ../../source/models/virtualenv.rst:271
+#: ../../source/models/virtualenv.rst:318
 msgid ""
 "``index_strategy``: passed through to the virtualenv installer (used by "
 "some engines)."
 msgstr " ``index_strategy`` ：传递给虚拟环境安装程序（由某些引擎使用）。"
 
-#: ../../source/models/virtualenv.rst:272
+#: ../../source/models/virtualenv.rst:319
 msgid "``no_build_isolation``: pip build isolation switch for tricky builds."
 msgstr " ``no_build_isolation`` ：用于处理复杂构建的pip构建隔离开关。"
 
-#: ../../source/models/virtualenv.rst:277
+#: ../../source/models/virtualenv.rst:324
 msgid "Use wrapped placeholders to inject engine defaults:"
 msgstr "使用包裹的占位符注入引擎默认值："
 
-#: ../../source/models/virtualenv.rst:279
+#: ../../source/models/virtualenv.rst:326
 msgid "``#vllm_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:280
+#: ../../source/models/virtualenv.rst:327
 msgid "``#sglang_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:281
+#: ../../source/models/virtualenv.rst:328
 msgid "``#mlx_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:282
+#: ../../source/models/virtualenv.rst:329
 msgid "``#transformers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:283
+#: ../../source/models/virtualenv.rst:330
 msgid "``#llama_cpp_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:284
+#: ../../source/models/virtualenv.rst:331
 msgid "``#diffusers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:285
+#: ../../source/models/virtualenv.rst:332
 msgid "``#sentence_transformers_dependencies#``"
 msgstr ""
 
-#: ../../source/models/virtualenv.rst:290
+#: ../../source/models/virtualenv.rst:337
 msgid ""
 "Markers use ``#engine#`` or ``#model_engine#`` comparisons (case-"
 "sensitive). Engine values are passed in lowercase internally, so prefer "
@@ -434,58 +490,6 @@ msgstr ""
 "进行比较（区分大小写）。引擎值在内部以小写形式传递，因此建议使用小写值，例如 ``#engine# == \"vllm\"`` 或 "
 "``#engine# == \"transformers\"`` 。"
 
-#: ../../source/models/virtualenv.rst:302
-msgid ""
-"Xinference provides comprehensive virtual environment management for "
-"model dependencies, allowing you to create isolated Python environments "
-"for each model with specific package requirements."
-msgstr "Xinference 提供全面的虚拟环境管理功能，允许您为每个模型创建独立的 Python 环境，满足特定的包依赖需求。"
-
-#: ../../source/models/virtualenv.rst:314
-msgid "Key Features"
-msgstr "核心功能"
-
-#: ../../source/models/virtualenv.rst:316
-msgid ""
-"**Multiple Python Version Support**: Each model can have virtual "
-"environments with different Python versions (e.g., Python 3.10.18, "
-"3.11.5), enabling compatibility with various model requirements."
-msgstr ""
-"**多 Python 版本支持** : 每个模型可以拥有不同 Python 版本的虚拟环境（例如 Python "
-"3.10.18、3.11.5），实现与各种模型要求的兼容性。"
-
-#: ../../source/models/virtualenv.rst:321
-msgid ""
-"**Isolated Dependencies**: Each virtual environment contains its own set "
-"of packages, preventing conflicts between different models' requirements."
-msgstr "**依赖隔离** : 每个虚拟环境包含自己独立的包集合，防止不同模型之间的依赖冲突。"
-
-#: ../../source/models/virtualenv.rst:326
-msgid "Management Operations"
-msgstr "管理操作"
-
-#: ../../source/models/virtualenv.rst:328
-msgid ""
-"**Listing Virtual Environments**: View all virtual environments across "
-"your cluster, filtered by model name or worker IP address."
-msgstr "**列出虚拟环境** : 查看集群中的所有虚拟环境，支持按模型名称或工作节点 IP 地址过滤。"
-
-#: ../../source/models/virtualenv.rst:332
-msgid ""
-"**Creating Environments**: Automatically created when launching models "
-"with enable_virtual_env=true. The system detects your current Python "
-"version and creates an isolated environment with the required packages."
-msgstr ""
-"**创建环境** : 当使用 enable_virtual_env=true 启动模型时自动创建。系统会检测当前的 Python "
-"版本并创建包含所需包的独立环境。"
-
-#: ../../source/models/virtualenv.rst:337
-msgid ""
-"**Removing Environments**: Delete specific virtual environments by model "
-"name and optionally Python version, or remove all environments for a "
-"model."
-msgstr "**删除环境** : 可按模型名称和可选的 Python 版本删除特定虚拟环境，或删除模型的所有环境。"
-
 #~ msgid "Manage Virtual Enviroments"
 #~ msgstr "虚拟环境管理"
 
diff --git a/doc/source/models/virtualenv.rst b/doc/source/models/virtualenv.rst
index 3f18c2d564..c40f0f37fe 100644
--- a/doc/source/models/virtualenv.rst
+++ b/doc/source/models/virtualenv.rst
@@ -230,6 +230,53 @@ In addition to the standard way of specifying package dependencies, such as ``tr
 * ``#system_xxx#``: Using the same version as the system site packages, such as ``#system_numpy#``,
   ensures that the installed package matches the system site package version of numpy. This helps prevent dependency conflicts.
 
+
+.. _manage_virtual_enviroments:
+
+Manage Virtual Enviroments
+--------------------------
+
+.. versionadded:: v1.14.0
+
+Xinference provides comprehensive virtual environment management for model dependencies,
+allowing you to create isolated Python environments for each model with specific package requirements.
+
+.. raw:: html
+
+    <img class="align-center" alt="actor" src="../_static/manage_virtual_envs1.png" style="background-color: transparent", width="95%">
+
+.. raw:: html
+
+    <img class="align-center" alt="actor" src="../_static/manage_virtual_envs2.png" style="background-color: transparent", width="95%">
+
+Key Features
+############
+
+**Multiple Python Version Support**:
+Each model can have virtual environments
+with different Python versions (e.g., Python 3.10.18, 3.11.5),
+enabling compatibility with various model requirements.
+
+**Isolated Dependencies**:
+Each virtual environment contains its own set of packages,
+preventing conflicts between different models' requirements.
+
+Management Operations
+#####################
+
+**Listing Virtual Environments**:
+View all virtual environments across your cluster,
+filtered by model name or worker IP address.
+
+**Creating Environments**:
+Automatically created when launching models with enable_virtual_env=true.
+The system detects your current Python version and creates an isolated
+environment with the required packages.
+
+**Removing Environments**:
+Delete specific virtual environments by model name and optionally
+Python version, or remove all environments for a model.
+
 ModelHub JSON for Xinference Models
 ###################################
 
@@ -291,49 +338,3 @@ Markers use ``#engine#`` or ``#model_engine#`` comparisons (case-sensitive).
 Engine values are passed in lowercase internally, so prefer lowercase values,
 for example ``#engine# == "vllm"`` or ``#engine# == "transformers"``.
 
-
-.. _manage_virtual_enviroments:
-
-Manage Virtual Enviroments
-------------------------
-
-.. versionadded:: v1.14.0
-
-Xinference provides comprehensive virtual environment management for model dependencies,
-allowing you to create isolated Python environments for each model with specific package requirements.
-
-.. raw:: html
-
-    <img class="align-center" alt="actor" src="../_static/manage_virtual_envs1.png" style="background-color: transparent", width="95%">
-
-.. raw:: html
-
-    <img class="align-center" alt="actor" src="../_static/manage_virtual_envs2.png" style="background-color: transparent", width="95%">
-
-Key Features
-############
-
-**Multiple Python Version Support**:
-Each model can have virtual environments
-with different Python versions (e.g., Python 3.10.18, 3.11.5),
-enabling compatibility with various model requirements.
-
-**Isolated Dependencies**:
-Each virtual environment contains its own set of packages,
-preventing conflicts between different models' requirements.
-
-Management Operations
-#####################
-
-**Listing Virtual Environments**:
-View all virtual environments across your cluster,
-filtered by model name or worker IP address.
-
-**Creating Environments**:
-Automatically created when launching models with enable_virtual_env=true.
-The system detects your current Python version and creates an isolated
-environment with the required packages.
-
-**Removing Environments**:
-Delete specific virtual environments by model name and optionally
-Python version, or remove all environments for a model.

From c56a64dbd750cc0427f943de9fd79d615246af87 Mon Sep 17 00:00:00 2001
From: OliverBryant <2713999266@qq.com>
Date: Thu, 29 Jan 2026 18:58:33 +0800
Subject: [PATCH 8/8] fix doc error

---
 doc/source/models/virtualenv.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/source/models/virtualenv.rst b/doc/source/models/virtualenv.rst
index c40f0f37fe..a3526a05b1 100644
--- a/doc/source/models/virtualenv.rst
+++ b/doc/source/models/virtualenv.rst
@@ -234,7 +234,7 @@ In addition to the standard way of specifying package dependencies, such as ``tr
 .. _manage_virtual_enviroments:
 
 Manage Virtual Enviroments
---------------------------
+##########################
 
 .. versionadded:: v1.14.0