You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/developer_guide/versioning_policy.md
+11-9Lines changed: 11 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,23 +5,23 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt
5
5
## vLLM Ascend Plugin versions
6
6
7
7
Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
8
-
`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`)
8
+
`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)
9
9
10
10
-**Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
11
11
-**Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
12
12
-**Post releases**: will typically be released **on demand** to support to address minor errors in a final release. It's different from [PEP-440 post release note](https://peps.python.org/pep-0440/#post-releases) suggestion, it will contain actual bug fixes considering that the final release version should be matched strictly with the vLLM final release version (`v[major].[minor].[micro]`). The post version has to be published as a patch version of the final release.
13
13
14
14
For example:
15
15
-`v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
16
-
-`v0.7.1rc1`: will be the first pre version of vllm-ascend.
17
-
-`v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors.
16
+
-`v0.7.3rc1`: will be the first pre version of vllm-ascend.
17
+
-`v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
18
18
19
19
## Branch policy
20
20
21
21
vllm-ascend has main branch and dev branch.
22
22
23
23
-**main**: main branch,corresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI.
24
-
-**vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version.
24
+
-**vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.
25
25
26
26
Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
27
27
@@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
56
56
```
57
57
58
+
:::{note}
59
+
If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly.
60
+
:::
61
+
58
62
Start the vLLM server on head node:
59
63
60
64
```shell
@@ -106,4 +110,4 @@ Logs of the vllm server:
106
110
```
107
111
INFO: 127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
Copy file name to clipboardExpand all lines: docs/source/user_guide/release_notes.md
+29-1Lines changed: 29 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,33 @@
1
1
# Release note
2
2
3
+
## v0.7.3rc1
4
+
5
+
🎉 Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
6
+
- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242)
11
+
- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252)
12
+
- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300)
13
+
14
+
### Core
15
+
- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy
16
+
- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229)
17
+
18
+
### Model
19
+
- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241)
20
+
- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164)
21
+
22
+
### Other
23
+
- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236)
24
+
-[Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail
25
+
- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807
26
+
27
+
### Known issues
28
+
- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release.
29
+
- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277)
30
+
3
31
## v0.7.1rc1
4
32
5
33
🎉 Hello, World!
@@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend
8
36
9
37
vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
10
38
11
-
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
39
+
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
| Automatic Prefix Caching | ❌ ||| NA | Plan in 2025.03.30|
7
+
|LoRA | ❌ ||| NA |Plan in 2025.06.30|
8
+
|Prompt adapter | ❌ ||| NA |Plan in 2025.06.30|
9
+
|Speculative decoding | ✅ ||| Basic functions available | Need fully test |
10
+
|Pooling | ✅ ||| Basic functions available(Bert) | Need fully test and add more models support|
11
+
|Enc-dec | ❌ ||| NA |Plan in 2025.06.30|
12
+
|Multi Modality | ✅ || ✅ | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)|Improve perforamance, and add more models support|
13
+
|LogProbs | ✅ ||| Basic functions available | Need fully test |
14
+
|Prompt logProbs | ✅ ||| Basic functions available | Need fully test |
15
+
|Async output | ✅ ||| Basic functions available | Need fully test |
16
+
|Multi step scheduler | ✅ ||| Basic functions available | Need fully test |
17
+
|Best of | ✅ ||| Basic functions available | Need fully test |
18
+
|Beam search | ✅ ||| Basic functions available | Need fully test |
19
+
|Guided Decoding | ✅ ||| Basic functions available| Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177)|
20
+
|Tensor Parallel | ✅ ||| Basic functions available | Need fully test |
21
+
|Pipeline Parallel | ✅ ||| Basic functions available | Need fully test |
0 commit comments