Skip to content

Commit c25631e

Browse files
authored
[Doc] Add the release note for 0.7.3rc1 (#285)
Add the release note for 0.7.3rc1 Signed-off-by: wangxiyuan <[email protected]>
1 parent 41aba1c commit c25631e

File tree

8 files changed

+81
-44
lines changed

8 files changed

+81
-44
lines changed

docs/source/developer_guide/versioning_policy.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,23 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt
55
## vLLM Ascend Plugin versions
66

77
Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
8-
`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`)
8+
`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)
99

1010
- **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
1111
- **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
1212
- **Post releases**: will typically be released **on demand** to support to address minor errors in a final release. It's different from [PEP-440 post release note](https://peps.python.org/pep-0440/#post-releases) suggestion, it will contain actual bug fixes considering that the final release version should be matched strictly with the vLLM final release version (`v[major].[minor].[micro]`). The post version has to be published as a patch version of the final release.
1313

1414
For example:
1515
- `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
16-
- `v0.7.1rc1`: will be the first pre version of vllm-ascend.
17-
- `v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors.
16+
- `v0.7.3rc1`: will be the first pre version of vllm-ascend.
17+
- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
1818

1919
## Branch policy
2020

2121
vllm-ascend has main branch and dev branch.
2222

2323
- **main**: main branch,corresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI.
24-
- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version.
24+
- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.
2525

2626
Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
2727

@@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
6767

6868
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
6969
|--------------|--------------| --- | --- | --- |
70+
| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 |
7071
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 |
7172

7273
## Release cadence
7374

74-
### Next final release (`v0.7.x`) window
75+
### Next final release (`v0.7.3`) window
7576

76-
| Date | Event |
77-
|------------|------------------------------------------------------------------|
78-
| March 2025 | Release candidates, v0.7.3rc1 |
79-
| March 2025 | Final release passes, match vLLM v0.7.x latest: v0.7.1 or v0.7.3 |
77+
| Date | Event |
78+
|------------|-------------------------------------------|
79+
| 2025.03.14 | Release candidates, v0.7.3rc1 |
80+
| 2025.03.20 | Release candidates if needed, v0.7.3rc2 |
81+
| 2025.03.30 | Final release, v0.7.3 |

docs/source/faqs.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Version Specific FAQs
44

55
- [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
6+
- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)
67

78
## General FAQs
89

docs/source/tutorials/multi_node.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
5555
ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
5656
```
5757

58+
:::{note}
59+
If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly.
60+
:::
61+
5862
Start the vLLM server on head node:
5963

6064
```shell
@@ -106,4 +110,4 @@ Logs of the vllm server:
106110
```
107111
INFO: 127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
108112
INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
109-
```
113+
```

docs/source/tutorials/single_npu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Single NPU (Qwen 7B)
1+
# Single NPU (Qwen2.5 7B)
22

33
## Run vllm-ascend on Single NPU
44

docs/source/tutorials/single_npu_multimodal.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Single NPU (Qwen2.5-VL-7B)
1+
# Single NPU (Qwen2.5-VL 7B)
22

33
## Run vllm-ascend on Single NPU
44

docs/source/user_guide/release_notes.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,33 @@
11
# Release note
22

3+
## v0.7.3rc1
4+
5+
🎉 Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
6+
- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
7+
- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html
8+
9+
### Highlights
10+
- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242)
11+
- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252)
12+
- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300)
13+
14+
### Core
15+
- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy
16+
- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229)
17+
18+
### Model
19+
- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241)
20+
- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164)
21+
22+
### Other
23+
- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236)
24+
- [Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail
25+
- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807
26+
27+
### Known issues
28+
- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release.
29+
- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277)
30+
331
## v0.7.1rc1
432

533
🎉 Hello, World!
@@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend
836

937
vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
1038

11-
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
39+
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
1240

1341
### Highlights
1442

docs/source/user_guide/supported_models.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,27 @@
22

33
| Model | Supported | Note |
44
|---------|-----------|------|
5-
| Qwen 2.5 |||
6-
| Mistral | | Need test |
7-
| DeepSeek v2.5 | |Need test |
85
| DeepSeek v3 ||||
9-
| DeepSeek Distill (Qwen/llama) |||
6+
| DeepSeek R1 ||||
7+
| DeepSeek Distill (Qwen/LLama) |||
8+
| Qwen2-VL |||
9+
| Qwen2-Audio |||
10+
| Qwen2.5 |||
11+
| Qwen2.5-VL |||
12+
| MiniCPM || |
1013
| LLama3.1/3.2 |||
14+
| Mistral | | Need test |
15+
| DeepSeek v2.5 | |Need test |
1116
| Gemma-2 | |Need test|
12-
| baichuan | |Need test|
13-
| minicpm | |Need test|
14-
| internlm |||
15-
| ChatGLM |||
16-
| InternVL 2.5 |||
17-
| Qwen2-VL |||
17+
| Baichuan | |Need test|
18+
| Internlm |||
19+
| ChatGLM || Plan in Q2|
20+
| InternVL2.5 |||
1821
| GLM-4v | |Need test|
1922
| Molomo |||
20-
| LLaVA 1.5 | ||
23+
| LLaVA1.5 | | Need test|
2124
| Mllama | |Need test|
2225
| LLaVA-Next | |Need test|
2326
| LLaVA-Next-Video | |Need test|
2427
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
2528
| Ultravox | |Need test|
26-
| Qwen2-Audio |||
Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
# Feature Support
22

3-
| Feature | Supported | Note |
4-
|---------|-----------|------|
5-
| Chunked Prefill || Plan in 2025 Q1 |
6-
| Automatic Prefix Caching | | Improve performance in 2025 Q2 |
7-
| LoRA || Plan in 2025 Q1 |
8-
| Prompt adapter || Plan in 2025 Q1 |
9-
| Speculative decoding || Plan in 2025 Q1 |
10-
| Pooling || |
11-
| Enc-dec || Plan in 2025 Q2 |
12-
| Multi Modality |(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
13-
| LogProbs |||
14-
| Prompt logProbs |||
15-
| Async output |||
16-
| Multi step scheduler || Plan in 2025 Q1 |
17-
| Best of |||
18-
| Beam search |||
19-
| Guided Decoding | | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
20-
| Tensor Parallel || Only "mp" supported now |
21-
| Pipeline Parallel || Only "mp" supported now |
3+
| Feature | Supported | CI Coverage | Guidance Document | Current Status | Next Step |
4+
|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
5+
| Chunked Prefill || | | NA | Plan in 2025.03.30 |
6+
| Automatic Prefix Caching | | | | NA | Plan in 2025.03.30 |
7+
| LoRA || | | NA | Plan in 2025.06.30 |
8+
| Prompt adapter || | | NA | Plan in 2025.06.30 |
9+
| Speculative decoding || | | Basic functions available | Need fully test |
10+
| Pooling || | | Basic functions available(Bert) | Need fully test and add more models support|
11+
| Enc-dec || | | NA | Plan in 2025.06.30|
12+
| Multi Modality || || Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support |
13+
| LogProbs || | | Basic functions available | Need fully test |
14+
| Prompt logProbs || | | Basic functions available | Need fully test |
15+
| Async output || | | Basic functions available | Need fully test |
16+
| Multi step scheduler || | | Basic functions available | Need fully test |
17+
| Best of || | | Basic functions available | Need fully test |
18+
| Beam search || | | Basic functions available | Need fully test |
19+
| Guided Decoding || | | Basic functions available | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
20+
| Tensor Parallel || | | Basic functions available | Need fully test |
21+
| Pipeline Parallel || | | Basic functions available | Need fully test |

0 commit comments

Comments
 (0)