[Core] Don't count preempted tokens in prefix cache hit rate #25787

zhuohan123 · 2025-09-26T21:47:22Z

Purpose

Do not count preempted tokens in prefix cache hit rate. See #25780 for details.

Test

Existing test should pass, and now when we run:

vllm bench throughput   --model NousResearch/Hermes-3-Llama-3.1-8B   --dataset-name random   --num-prompts 1000 --input-len 1024 --output-len 1024

We see

INFO 09-26 14:24:45 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 7097.7 tokens/s, Running: 260 reqs, Waiting: 475 reqs, GPU KV cache usage: 100.0%, Prefix cache hit rate: 0.2%

Prefix cache hit rate droped from 30% to 0.2%.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Zhuohan Li <[email protected]>

gemini-code-assist

Code Review

This pull request correctly modifies the prefix cache hit rate calculation to exclude tokens from preempted requests. The changes are implemented by adding a preemption counter to requests, which is then used to separate statistics for new and preempted requests. The implementation is clean and directly addresses the issue. The associated refactoring in the scheduler improves code readability. I have reviewed the changes and found no issues.

zhuohan123 · 2025-09-26T21:49:54Z

vllm/v1/core/sched/scheduler.py

+                self.encoder_cache_manager.free(preempted_req)
+                preempted_req.status = RequestStatus.PREEMPTED
+                preempted_req.num_computed_tokens = 0
+                preempted_req.num_preemptions += 1


Note: this line is the only actual change in this file, all other changes are code style change.

WoosukKwon

LGTM. Thanks for doing this.

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]>

Signed-off-by: Zhuohan Li <[email protected]> Signed-off-by: yewentao256 <[email protected]>

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]>

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]>

[Core] Don't count preempted tokens in Prefix cache hit rate

7896e57

Signed-off-by: Zhuohan Li <[email protected]>

zhuohan123 requested review from ApostaC, WoosukKwon, alexm-redhat, comaniac, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners September 26, 2025 21:47

mergify bot added the v1 label Sep 26, 2025

gemini-code-assist bot reviewed Sep 26, 2025

View reviewed changes

zhuohan123 commented Sep 26, 2025

View reviewed changes

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 26, 2025

WoosukKwon approved these changes Sep 26, 2025

View reviewed changes

Merge branch 'main' into zhuohan/fix-prefix-caching-stat

a2a8089

zhuohan123 enabled auto-merge (squash) September 26, 2025 22:38

zhuohan123 merged commit 8bf8f45 into main Sep 27, 2025
44 checks passed

zhuohan123 deleted the zhuohan/fix-prefix-caching-stat branch September 27, 2025 00:16

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[Core] Don't count preempted tokens in prefix cache hit rate (vllm-pr…

cbf0eca

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Core] Don't count preempted tokens in prefix cache hit rate (#25787)

806b292

Signed-off-by: Zhuohan Li <[email protected]> Signed-off-by: yewentao256 <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Core] Don't count preempted tokens in prefix cache hit rate (vllm-pr…

d9b5852

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

ptovam mentioned this pull request Oct 19, 2025

[Metrics] [KVConnector] Add connector prefix cache hit rate stats #26245

Merged

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Core] Don't count preempted tokens in prefix cache hit rate (vllm-pr…

6b0cd60

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Core] Don't count preempted tokens in prefix cache hit rate (vllm-pr…

8ce8041

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Core] Don't count preempted tokens in prefix cache hit rate (vllm-pr…

87592db

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Core] Don't count preempted tokens in prefix cache hit rate (vllm-pr…

0be087a

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Core] Don't count preempted tokens in prefix cache hit rate (vllm-pr…

b3e71a6

…oject#25787) Signed-off-by: Zhuohan Li <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Don't count preempted tokens in prefix cache hit rate #25787

[Core] Don't count preempted tokens in prefix cache hit rate #25787

Uh oh!

zhuohan123 commented Sep 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

zhuohan123 Sep 26, 2025

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Core] Don't count preempted tokens in prefix cache hit rate #25787

[Core] Don't count preempted tokens in prefix cache hit rate #25787

Uh oh!

Conversation

zhuohan123 commented Sep 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

zhuohan123 Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhuohan123 commented Sep 26, 2025 •

edited by github-actions bot

Loading