[Doc] Add the release note for 0.7.3rc1 (#285)

wangxiyuan · web-flow · commit c25631ec7b4c · 2025-03-13T17:57:06.000+08:00
Add the release note for 0.7.3rc1

Signed-off-by: wangxiyuan &lt;wangxiyuan1007@gmail.com&gt;
diff --git a/docs/source/developer_guide/versioning_policy.md b/docs/source/developer_guide/versioning_policy.md
@@ -5,23 +5,23 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt
 ## vLLM Ascend Plugin versions
 
 Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
-`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`)
+`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)
 
 - **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
 - **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
 - **Post releases**: will typically be released **on demand** to support to address minor errors in a final release. It's different from [PEP-440 post release note](https://peps.python.org/pep-0440/#post-releases) suggestion, it will contain actual bug fixes considering that the final release version should be matched strictly with the vLLM final release version (`v[major].[minor].[micro]`). The post version has to be published as a patch version of the final release.
 
 For example:
 - `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
-- `v0.7.1rc1`: will be the first pre version of vllm-ascend.
-- `v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors.
+- `v0.7.3rc1`: will be the first pre version of vllm-ascend.
+- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
 
 ## Branch policy
 
 vllm-ascend has main branch and dev branch.
 
 - **main**: main branch，corresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI.
-- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version.
+- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.
 
 Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
 
@@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
 
 | vllm-ascend  | vLLM         | Python | Stable CANN | PyTorch/torch_npu |
 |--------------|--------------| --- | --- | --- |
+| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250308 |
 | v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250218 |
 
 ## Release cadence
 
-### Next final release (`v0.7.x`) window
+### Next final release (`v0.7.3`) window
 
-| Date       | Event                                                            |
-|------------|------------------------------------------------------------------|
-| March 2025 | Release candidates, v0.7.3rc1                                    |
-| March 2025 | Final release passes, match vLLM v0.7.x latest: v0.7.1 or v0.7.3 |
+| Date       | Event                                     |
+|------------|-------------------------------------------|
+| 2025.03.14 | Release candidates, v0.7.3rc1             |
+| 2025.03.20 | Release candidates if needed, v0.7.3rc2   |
+| 2025.03.30 | Final release, v0.7.3                     |
diff --git a/docs/source/faqs.md b/docs/source/faqs.md
@@ -3,6 +3,7 @@
 ## Version Specific FAQs
 
 - [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
+- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)
 
 ## General FAQs
 
diff --git a/docs/source/tutorials/multi_node.md b/docs/source/tutorials/multi_node.md
@@ -55,6 +55,10 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
 ```
 
+:::{note}
+If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly.
+:::
+
 Start the vLLM server on head node:
 
 ```shell
@@ -106,4 +110,4 @@ Logs of the vllm server:
 ```
 INFO:     127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
 INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
-```
+```
diff --git a/docs/source/tutorials/single_npu.md b/docs/source/tutorials/single_npu.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen 7B)
+# Single NPU (Qwen2.5 7B)
 
 ## Run vllm-ascend on Single NPU
 
diff --git a/docs/source/tutorials/single_npu_multimodal.md b/docs/source/tutorials/single_npu_multimodal.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen2.5-VL-7B)
+# Single NPU (Qwen2.5-VL 7B)
 
 ## Run vllm-ascend on Single NPU
 
diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md
@@ -1,5 +1,33 @@
 # Release note
 
+## v0.7.3rc1
+
+🎉 Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
+- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
+- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html
+
+### Highlights
+- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242)
+- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252)
+- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300)
+
+### Core
+- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy
+- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229)
+
+### Model
+- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241)
+- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164)
+
+### Other
+- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236)
+- [Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail
+- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807
+
+### Known issues
+- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release.
+- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277)
+
 ## v0.7.1rc1
 
 🎉 Hello, World!
@@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend
 
 vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
 
-Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
+Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
 
 ### Highlights
 
diff --git a/docs/source/user_guide/supported_models.md b/docs/source/user_guide/supported_models.md
@@ -2,25 +2,27 @@
 
 | Model | Supported | Note |
 |---------|-----------|------|
-| Qwen 2.5 | ✅ ||
-| Mistral |  | Need test |
-| DeepSeek v2.5 | |Need test |
 | DeepSeek v3 | ✅|||
-| DeepSeek Distill (Qwen/llama) |✅||
+| DeepSeek R1 | ✅|||
+| DeepSeek Distill (Qwen/LLama) |✅||
+| Qwen2-VL | ✅ ||
+| Qwen2-Audio | ✅ ||
+| Qwen2.5 | ✅ ||
+| Qwen2.5-VL | ✅ ||
+| MiniCPM |✅| |
 | LLama3.1/3.2 | ✅ ||
+| Mistral |  | Need test |
+| DeepSeek v2.5 | |Need test |
 | Gemma-2 |  |Need test|
-| baichuan |  |Need test|
-| minicpm |  |Need test|
-| internlm | ✅ ||
-| ChatGLM | ✅ ||
-| InternVL 2.5 | ✅ ||
-| Qwen2-VL | ✅ ||
+| Baichuan |  |Need test|
+| Internlm | ✅ ||
+| ChatGLM | ❌ | Plan in Q2|
+| InternVL2.5 | ✅ ||
 | GLM-4v |  |Need test|
 | Molomo | ✅ ||
-| LLaVA 1.5 | ✅ ||
+| LLaVA1.5 | | Need test|
 | Mllama |  |Need test|
 | LLaVA-Next |  |Need test|
 | LLaVA-Next-Video |  |Need test|
 | Phi-3-Vison/Phi-3.5-Vison |  |Need test|
 | Ultravox |  |Need test|
-| Qwen2-Audio | ✅ ||
diff --git a/docs/source/user_guide/suppoted_features.md b/docs/source/user_guide/suppoted_features.md
@@ -1,21 +1,21 @@
 # Feature Support
 
-| Feature | Supported | Note |
-|---------|-----------|------|
-| Chunked Prefill | ✗ | Plan in 2025 Q1 |
-| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q2 |
-| LoRA | ✗ | Plan in 2025 Q1 |
-| Prompt adapter | ✗ | Plan in 2025 Q1 |
-| Speculative decoding | ✗ | Plan in 2025 Q1 |
-| Pooling | ✅ | |
-| Enc-dec | ✗ | Plan in 2025 Q2 |
-| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
-| LogProbs | ✅ ||
-| Prompt logProbs | ✅ ||
-| Async output | ✅ ||
-| Multi step scheduler | ✗ | Plan in 2025 Q1 |
-| Best of | ✅ ||
-| Beam search | ✅ ||
-| Guided Decoding | ✅ | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
-| Tensor Parallel | ✅ | Only "mp" supported now |
-| Pipeline Parallel | ✅ | Only "mp" supported now |
+|           Feature        | Supported | CI Coverage | Guidance Document |     Current Status        |    Next Step       |
+|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
+| Chunked Prefill          |     ❌    |             |                   |          NA               | Plan in 2025.03.30 |
+| Automatic Prefix Caching |     ❌    |             |                   |          NA               | Plan in 2025.03.30 |
+|          LoRA            |     ❌    |             |                   |          NA               | Plan in 2025.06.30 |
+|      Prompt adapter      |     ❌    |             |                   |          NA               | Plan in 2025.06.30 |
+|    Speculative decoding  |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|        Pooling           |     ✅    |             |                   | Basic functions available(Bert) | Need fully test and add more models support|
+|        Enc-dec           |     ❌    |             |                   |          NA               | Plan in 2025.06.30|
+|      Multi Modality      |     ✅    |             |         ✅        | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support |
+|        LogProbs          |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|     Prompt logProbs      |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|       Async output       |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|   Multi step scheduler   |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|          Best of         |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|        Beam search       |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|      Guided Decoding     |     ✅    |             |                   | Basic functions available | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
+|      Tensor Parallel     |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|     Pipeline Parallel    |     ✅    |             |                   | Basic functions available |   Need fully test  |

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Single NPU (Qwen 7B)`
	`1`	`+# Single NPU (Qwen2.5 7B)`
`2`	`2`
`3`	`3`	`## Run vllm-ascend on Single NPU`
`4`	`4`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Single NPU (Qwen2.5-VL-7B)`
	`1`	`+# Single NPU (Qwen2.5-VL 7B)`
`2`	`2`
`3`	`3`	`## Run vllm-ascend on Single NPU`
`4`	`4`