GPU Model Runner V2 #25266

WoosukKwon · 2025-09-19T15:22:47Z

Key Changes

Remove persistent batch
- No “reordering” & complex bookkeeping
- Almost all CPU states are Numpy arrays → We can vectorize most of the Python loops in pre-/post-processing
- Simpler handling for requests resumed from preemption
GPU-persistent block tables
- The CPU does not have the block tables at all. GPU maintains the persistent block tables.
- In every step, we only send the “diff”s to the GPU, and use a Triton kernel to update the persistent block tables
- We also use another Triton kernel to create new ephemeral block tables used for this forward pass.
- More scalable as max_model_len and num_kv_groups increase
Triton-native sampler
- No -1 temperature hack for greedy sampling
- Efficient support for per-request seeds
- Efficient support for logprobs by only materializing the top-k logprobs instead of the whole vocab
- Memory-efficient implementation of prompt logprobs
Simple implementation of DP
Simple CUDA graphs
Efficient support for structured outputs

Signed-off-by: Woosuk Kwon <[email protected]>

njhill

Approved :)

DarkLight1337 · 2025-11-21T17:08:58Z

Looks like this is failing pre-commit on main

WoosukKwon · 2025-11-21T17:26:07Z

yeah let me fix the error

The PR passed the pre-commit test in CI somehow.

Signed-off-by: Woosuk Kwon <[email protected]>

…9190)

Selkh · 2025-11-24T03:17:04Z

Will V2 support async spec decoding in another way or pick the impl in V1?

Signed-off-by: Woosuk Kwon <[email protected]>

…9190)

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Runkai Tao <[email protected]>

…9190) Signed-off-by: Runkai Tao <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

…9190)

Signed-off-by: Woosuk Kwon <[email protected]>

…9190)

Signed-off-by: Woosuk Kwon <[email protected]>

…9190)

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>

…9190) Signed-off-by: Xingyu Liu <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

…9190)

WoosukKwon added 30 commits August 17, 2025 14:38

wip

33a3a26

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk/input-prep

699bd79

merge

c472982

Signed-off-by: Woosuk Kwon <[email protected]>

wip

79e5eb3

Signed-off-by: Woosuk Kwon <[email protected]>

rename

64c8cce

Signed-off-by: Woosuk Kwon <[email protected]>

merge

48bca9a

Signed-off-by: Woosuk Kwon <[email protected]>

wip

a1e3745

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk/input-prep

da9cd26

fix

7b4b72e

Signed-off-by: Woosuk Kwon <[email protected]>

merge

65f9369

Signed-off-by: Woosuk Kwon <[email protected]>

fix

b1d5273

Signed-off-by: Woosuk Kwon <[email protected]>

simplify

a851aaa

Signed-off-by: Woosuk Kwon <[email protected]>

merge

e570b0a

Signed-off-by: Woosuk Kwon <[email protected]>

Merge branch 'main' into woosuk/input-prep

d6d719f

Merge branch 'main' into woosuk/input-prep

b21393c

minor

efba25e

Signed-off-by: Woosuk Kwon <[email protected]>

fix

e451045

Signed-off-by: Woosuk Kwon <[email protected]>

minor

19c0dfc

Signed-off-by: Woosuk Kwon <[email protected]>

minor

4055781

Signed-off-by: Woosuk Kwon <[email protected]>

fix

9ee9d0e

Signed-off-by: Woosuk Kwon <[email protected]>

merge

efcb786

Signed-off-by: Woosuk Kwon <[email protected]>

minor

e696f78

Signed-off-by: Woosuk Kwon <[email protected]>

optimize spec

c11d1e6

Signed-off-by: Woosuk Kwon <[email protected]>

work

22771e5

Signed-off-by: Woosuk Kwon <[email protected]>

MAX_SPEC_LEN

ba1a58f

Signed-off-by: Woosuk Kwon <[email protected]>

fix

62d23b3

Signed-off-by: Woosuk Kwon <[email protected]>

fix

af7b6c5

Signed-off-by: Woosuk Kwon <[email protected]>

fix

01bf16e

Signed-off-by: Woosuk Kwon <[email protected]>

top_p top_k

cc340e2

Signed-off-by: Woosuk Kwon <[email protected]>

merge

4c2a337

Signed-off-by: Woosuk Kwon <[email protected]>

mergify bot removed the needs-rebase label Nov 20, 2025

WoosukKwon added 3 commits November 20, 2025 20:41

nick's comment

e9152dd

minor

104b2fa

num_computed_tokens_cpu

327c0e3

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2025

LucasWilkinson mentioned this pull request Nov 21, 2025

[Performance]: Fully Async Spec-Decoding | Make seq_lens_cpu in CommonAttentionMetadata optional #29134

Open

WoosukKwon merged commit 30b44a1 into main Nov 21, 2025
54 of 58 checks passed

github-project-automation bot moved this to Done in NVIDIA Nov 21, 2025

WoosukKwon deleted the woosuk/model-runner-v2 branch November 21, 2025 16:20

njhill reviewed Nov 21, 2025

View reviewed changes

WoosukKwon added a commit that referenced this pull request Nov 21, 2025

[Chore] Fix pre-commit error after #25266 (#29190)

1bed891

ywang96 pushed a commit to ywang96/vllm that referenced this pull request Nov 23, 2025

GPU Model Runner V2 (vllm-project#25266)

43f7b31

Signed-off-by: Woosuk Kwon <[email protected]>

ywang96 pushed a commit to ywang96/vllm that referenced this pull request Nov 23, 2025

[Chore] Fix pre-commit error after vllm-project#25266 (vllm-project#2…

2cf6dc4

…9190)

lpapavassiliou pushed a commit to lpapavassiliou/vllm that referenced this pull request Nov 24, 2025

GPU Model Runner V2 (vllm-project#25266)

34b3cfe

Signed-off-by: Woosuk Kwon <[email protected]>

lpapavassiliou pushed a commit to lpapavassiliou/vllm that referenced this pull request Nov 24, 2025

[Chore] Fix pre-commit error after vllm-project#25266 (vllm-project#2…

3084748

…9190)

RunkaiTao pushed a commit to RunkaiTao/vllm that referenced this pull request Nov 24, 2025

GPU Model Runner V2 (vllm-project#25266)

d4bf1d9

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Runkai Tao <[email protected]>

RunkaiTao pushed a commit to RunkaiTao/vllm that referenced this pull request Nov 24, 2025

[Chore] Fix pre-commit error after vllm-project#25266 (vllm-project#2…

0cf7a78

…9190) Signed-off-by: Runkai Tao <[email protected]>

bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025

GPU Model Runner V2 (vllm-project#25266)

efb8444

Signed-off-by: Woosuk Kwon <[email protected]>

bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025

[Chore] Fix pre-commit error after vllm-project#25266 (vllm-project#2…

5e91584

…9190)

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

GPU Model Runner V2 (vllm-project#25266)

508eb1e

Signed-off-by: Woosuk Kwon <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Chore] Fix pre-commit error after vllm-project#25266 (vllm-project#2…

76e4d62

…9190)

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

GPU Model Runner V2 (vllm-project#25266)

a671ce7

Signed-off-by: Woosuk Kwon <[email protected]>

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

[Chore] Fix pre-commit error after vllm-project#25266 (vllm-project#2…

fccff94

…9190)

charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025

GPU Model Runner V2 (vllm-project#25266)

0c35a22

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>

charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025

[Chore] Fix pre-commit error after vllm-project#25266 (vllm-project#2…

2d795b7

…9190) Signed-off-by: Xingyu Liu <[email protected]>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Dec 6, 2025

GPU Model Runner V2 (vllm-project#25266)

4b2c7c2

Signed-off-by: Woosuk Kwon <[email protected]>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Dec 6, 2025

[Chore] Fix pre-commit error after vllm-project#25266 (vllm-project#2…

4c33c8e

…9190)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GPU Model Runner V2 #25266

GPU Model Runner V2 #25266

WoosukKwon commented Sep 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

njhill left a comment

Uh oh!

DarkLight1337 commented Nov 21, 2025

Uh oh!

WoosukKwon commented Nov 21, 2025

Uh oh!

Selkh commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Uh oh!

GPU Model Runner V2 #25266

GPU Model Runner V2 #25266

Conversation

WoosukKwon commented Sep 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Nov 21, 2025

Uh oh!

WoosukKwon commented Nov 21, 2025

Uh oh!

Selkh commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

WoosukKwon commented Sep 19, 2025 •

edited by github-actions bot

Loading