Skip to content

Conversation

@WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Aug 20, 2025

Since our page size is 16, new_block_ids for decode requests is empty for every 15 out of 16 steps.
However, currently we serialize the empty lists every step.
This PR fixes this unnecessary overheads by making them None before serialization.

TODO: When block_size is different across groups, we should change list[Optional[tuple[list[int], ...]] into list[Optional[tuple[Optional[list[int], ....]]]].

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optimization to avoid serializing empty block ID lists for decode requests, which occurs frequently. This is achieved by returning None instead of an empty list structure. The changes are consistently applied across the scheduler and worker components. The type hints have been updated to reflect the optional return value, and consumers of this data now correctly handle the None case. The refactoring to delay block ID generation is a clean approach. The code quality is high, and the changes appear correct and safe.

@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 20, 2025
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@heheda12345
Copy link
Collaborator

TODO: When block_size is different across groups, we should change list[Optional[tuple[list[int], ...]] into list[Optional[tuple[Optional[list[int], ....]]]].

I don't think this optimization is necessary.

And FYI, we now support mamba by setting block_size=max_model_len to mamba layers, so block_size is already different across different groups.

@heheda12345 heheda12345 merged commit b029de9 into main Aug 21, 2025
49 of 50 checks passed
@heheda12345 heheda12345 deleted the woosuk/block-ids branch August 21, 2025 01:25
adobrzyn added a commit to vllm-project/vllm-gaudi that referenced this pull request Aug 21, 2025
…ne if empty (#93)

Culprit commit: vllm-project/vllm#23262

---------

Signed-off-by: Agata Dobrzyniewicz <[email protected]>
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Aug 21, 2025
### What this PR does / why we need it?
1. use action/checkout@v5 instead of v4
2. remove dbo test case because there is issue with it and will be
refactored later
3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it
4. fix sampler api changes introduced by
vllm-project/vllm#22387
6. fix qwen3 moe config changes intruoduced by
vllm-project/vllm#20562
7. fix kvcache block changes introduced by
vllm-project/vllm#23262

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@0c6e40b

---------

Signed-off-by: MengqingCao <[email protected]>
mswiniarsk pushed a commit to vllm-project/vllm-gaudi that referenced this pull request Aug 25, 2025
…ne if empty (#93)

Culprit commit: vllm-project/vllm#23262

---------

Signed-off-by: Agata Dobrzyniewicz <[email protected]>
Signed-off-by: Marcin Swiniarski <[email protected]>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
### What this PR does / why we need it?
1. use action/checkout@v5 instead of v4
2. remove dbo test case because there is issue with it and will be
refactored later
3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it
4. fix sampler api changes introduced by
vllm-project/vllm#22387
6. fix qwen3 moe config changes intruoduced by
vllm-project/vllm#20562
7. fix kvcache block changes introduced by
vllm-project/vllm#23262

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@0c6e40b

---------

Signed-off-by: MengqingCao <[email protected]>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
1. use action/checkout@v5 instead of v4
2. remove dbo test case because there is issue with it and will be
refactored later
3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it
4. fix sampler api changes introduced by
vllm-project/vllm#22387
6. fix qwen3 moe config changes intruoduced by
vllm-project/vllm#20562
7. fix kvcache block changes introduced by
vllm-project/vllm#23262

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@0c6e40b

---------

Signed-off-by: MengqingCao <[email protected]>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
1. use action/checkout@v5 instead of v4
2. remove dbo test case because there is issue with it and will be
refactored later
3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it
4. fix sampler api changes introduced by
vllm-project/vllm#22387
6. fix qwen3 moe config changes intruoduced by
vllm-project/vllm#20562
7. fix kvcache block changes introduced by
vllm-project/vllm#23262

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@0c6e40b

---------

Signed-off-by: MengqingCao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants