Skip to content

Commit 4205078

Browse files
22dimensionsNSDie
authored andcommitted
Upgrade to 0.11.1 newest vllm commit (vllm-project#3982)
### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <[email protected]> Signed-off-by: nsdie <[email protected]>
1 parent 36f9e42 commit 4205078

File tree

21 files changed

+258
-242
lines changed

21 files changed

+258
-242
lines changed

.github/workflows/format_pr_body.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636

3737
- name: Get vLLM version
3838
run: |
39-
VLLM_COMMIT=83f478bb19489b41e9d208b47b4bb5a95ac171ac
39+
VLLM_COMMIT=2918c1b49c88c29783c86f78d2c4221cb9622379
4040
echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> $GITHUB_ENV
4141
4242
- name: Checkout repository

.github/workflows/vllm_ascend_test.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
lint:
4343
uses: ./.github/workflows/pre-commit.yml
4444
with:
45-
vllm: 83f478bb19489b41e9d208b47b4bb5a95ac171ac
45+
vllm: 2918c1b49c88c29783c86f78d2c4221cb9622379
4646
changes:
4747
runs-on: ubuntu-latest
4848
outputs:
@@ -83,7 +83,7 @@ jobs:
8383
VLLM_USE_MODELSCOPE: True
8484
strategy:
8585
matrix:
86-
vllm_version: [83f478bb19489b41e9d208b47b4bb5a95ac171ac, v0.11.0]
86+
vllm_version: [2918c1b49c88c29783c86f78d2c4221cb9622379, v0.11.0]
8787
steps:
8888
- name: Install packages
8989
run: |
@@ -138,7 +138,7 @@ jobs:
138138
name: e2e-light
139139
strategy:
140140
matrix:
141-
vllm_version: [83f478bb19489b41e9d208b47b4bb5a95ac171ac, v0.11.0]
141+
vllm_version: [2918c1b49c88c29783c86f78d2c4221cb9622379, v0.11.0]
142142
# Note (yikun): If CI resource are limited we can split job into two chain jobs
143143
needs: [lint, changes]
144144
# only trigger e2e test after lint passed and the change is e2e related with pull request.

.github/workflows/vllm_ascend_test_full.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ jobs:
6969
name: e2e-full
7070
strategy:
7171
matrix:
72-
vllm_version: [83f478bb19489b41e9d208b47b4bb5a95ac171ac, v0.11.0]
72+
vllm_version: [2918c1b49c88c29783c86f78d2c4221cb9622379, v0.11.0]
7373
needs: [changes]
7474
if: ${{ needs.changes.outputs.e2e_tracker == 'true' }}
7575
uses: ./.github/workflows/_e2e_test.yaml

docs/source/community/versioning_policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ The table below is the release compatibility matrix for vLLM Ascend release.
4242
For main branch of vLLM Ascend, we usually make it compatible with the latest vLLM release and a newer commit hash of vLLM. Please note that this table is usually updated. Please check it regularly.
4343
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
4444
|-------------|--------------|------------------|-------------|--------------------|
45-
| main | v0.11.0/83f478bb19489b41e9d208b47b4bb5a95ac171ac | >= 3.10, < 3.12 | 8.3.RC1 | 2.7.1 / 2.7.1 |
45+
| main | v0.11.0/2918c1b49c88c29783c86f78d2c4221cb9622379 | >= 3.10, < 3.12 | 8.3.RC1 | 2.7.1 / 2.7.1 |
4646

4747
## Release cadence
4848

tests/ut/worker/test_worker_v1.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
from tests.ut.base import TestBase
99
from vllm_ascend.utils import vllm_version_is
1010

11+
init_cached_hf_modules_path = "vllm.utils.init_cached_hf_modules" if vllm_version_is(
12+
"0.11.0") else "vllm.utils.import_utils.init_cached_hf_modules"
13+
1114

1215
class TestNPUWorker(TestBase):
1316

@@ -53,7 +56,7 @@ def setUp(self):
5356
@patch("vllm_ascend.worker.worker_v1.init_ascend_config")
5457
@patch("vllm_ascend.worker.worker_v1.init_ascend_soc_version")
5558
@patch("vllm_ascend.worker.worker_v1.try_register_lib")
56-
@patch("vllm.utils.init_cached_hf_modules")
59+
@patch(init_cached_hf_modules_path)
5760
@patch("vllm_ascend.worker.worker_v1.NPUWorker._init_profiler")
5861
def test_init_npu_worker_normal_case(
5962
self,
@@ -115,7 +118,7 @@ def test_init_npu_worker_normal_case(
115118
@patch("vllm_ascend.worker.worker_v1.init_ascend_config")
116119
@patch("vllm_ascend.worker.worker_v1.init_ascend_soc_version")
117120
@patch("vllm_ascend.worker.worker_v1.try_register_lib")
118-
@patch("vllm.utils.init_cached_hf_modules")
121+
@patch(init_cached_hf_modules_path)
119122
@patch("vllm_ascend.worker.worker_v1.NPUWorker._init_profiler")
120123
def test_init_npu_worker_with_trust_remote_code(
121124
self,
@@ -160,7 +163,7 @@ def test_init_npu_worker_with_trust_remote_code(
160163
@patch("vllm_ascend.worker.worker_v1.init_ascend_config")
161164
@patch("vllm_ascend.worker.worker_v1.init_ascend_soc_version")
162165
@patch("vllm_ascend.worker.worker_v1.try_register_lib")
163-
@patch("vllm.utils.init_cached_hf_modules")
166+
@patch(init_cached_hf_modules_path)
164167
@patch("vllm_ascend.worker.worker_v1.NPUWorker._init_profiler")
165168
def test_init_npu_worker_with_custom_cache_dtype(
166169
self,

vllm_ascend/attention/attention_v1.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,14 @@
3131
get_decode_context_model_parallel_rank,
3232
get_decode_context_model_parallel_world_size)
3333
from vllm.forward_context import ForwardContext, get_forward_context
34-
from vllm.utils import cdiv
34+
35+
from vllm_ascend.utils import vllm_version_is
36+
37+
if vllm_version_is("0.11.0"):
38+
from vllm.utils import cdiv
39+
else:
40+
from vllm.utils.math_utils import cdiv
41+
3542
from vllm.v1.attention.backends.utils import AttentionCGSupport
3643
from vllm.v1.core.sched.output import SchedulerOutput
3744
from vllm.v1.kv_cache_interface import AttentionSpec

vllm_ascend/attention/mla_v1.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,14 @@
2222
from vllm.logger import logger
2323
from vllm.model_executor.layers.linear import (LinearBase,
2424
UnquantizedLinearMethod)
25-
from vllm.utils import cdiv, round_down
25+
26+
from vllm_ascend.utils import vllm_version_is
27+
28+
if vllm_version_is("0.11.0"):
29+
from vllm.utils import cdiv, round_down
30+
else:
31+
from vllm.utils.math_utils import cdiv, round_down
32+
2633
from vllm.v1.attention.backends.utils import AttentionCGSupport
2734

2835
from vllm_ascend import envs

vllm_ascend/core/scheduler.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,14 @@
2222
from vllm.distributed.kv_events import KVEventBatch
2323
from vllm.logger import logger
2424
from vllm.multimodal import MULTIMODAL_REGISTRY, MultiModalRegistry
25-
from vllm.utils import cdiv
25+
26+
from vllm_ascend.utils import vllm_version_is
27+
28+
if vllm_version_is("0.11.0"):
29+
from vllm.utils import cdiv
30+
else:
31+
from vllm.utils.math_utils import cdiv
32+
2633
from vllm.v1.core.kv_cache_manager import KVCacheBlocks
2734
from vllm.v1.core.sched.output import NewRequestData, SchedulerOutput
2835
from vllm.v1.core.sched.scheduler import Scheduler

vllm_ascend/distributed/mooncake/config_data.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,15 @@
99
import torch
1010
from vllm.distributed.kv_transfer.kv_connector.v1.base import \
1111
KVConnectorMetadata
12-
from vllm.utils import cdiv, logger
12+
from vllm.utils import logger
13+
14+
from vllm_ascend.utils import vllm_version_is
15+
16+
if vllm_version_is("0.11.0"):
17+
from vllm.utils import cdiv
18+
else:
19+
from vllm.utils.math_utils import cdiv
20+
1321
from vllm.v1.core.sched.output import NewRequestData
1422

1523
DEFAULT_GLOBAL_SEGMENT_SIZE = 3355443200 # 3.125 GiB

vllm_ascend/models/qwen2_5_vl.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
from vllm.model_executor.models.utils import maybe_prefix
4343
from vllm.multimodal import MULTIMODAL_REGISTRY
4444

45+
from vllm_ascend.ascend_forward_context import set_ascend_forward_context
4546
from vllm_ascend.utils import (ACL_FORMAT_FRACTAL_ND, is_enable_nz,
4647
vllm_version_is)
4748

@@ -536,7 +537,11 @@ def _process_image_input(self, image_input) -> tuple[torch.Tensor, ...]:
536537
image_embeds = image_input["image_embeds"].type(self.visual.dtype)
537538
else:
538539
pixel_values = image_input["pixel_values"].type(self.visual.dtype)
539-
image_embeds = self.visual(pixel_values, grid_thw=grid_thw)
540+
if vllm_version_is("0.11.0"):
541+
image_embeds = self.visual(pixel_values, grid_thw=grid_thw)
542+
else:
543+
with set_ascend_forward_context(None, self.vllm_config):
544+
image_embeds = self.visual(pixel_values, grid_thw=grid_thw)
540545

541546
# Split concatenated embeddings for each image item.
542547
merge_size = self.visual.spatial_merge_size
@@ -553,7 +558,13 @@ def _process_video_input(self, video_input) -> tuple[torch.Tensor, ...]:
553558
else:
554559
pixel_values_videos = video_input["pixel_values_videos"].type(
555560
self.visual.dtype)
556-
video_embeds = self.visual(pixel_values_videos, grid_thw=grid_thw)
561+
if vllm_version_is("0.11.0"):
562+
video_embeds = self.visual(pixel_values_videos,
563+
grid_thw=grid_thw)
564+
else:
565+
with set_ascend_forward_context(None, self.vllm_config):
566+
video_embeds = self.visual(pixel_values_videos,
567+
grid_thw=grid_thw)
557568

558569
# Split concatenated embeddings for each video item.
559570
merge_size = self.visual.spatial_merge_size

0 commit comments

Comments
 (0)