[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled #26586

southfreebird · 2025-10-10T13:18:21Z

Fix an error that appears after #19482 when logit processors (such as penalties) are enabled together with speculative decoding and structural output. The example of the error:

File "/vllm/model_executor/layers/utils.py", line 45, in get_token_bin_counts_and_mask
     bin_counts.scatter_add_(1, tokens, torch.ones_like(tokens))
RuntimeError: Expected index [24, 4162] to be no larger than self [21, 201089] apart from dimension 1 and to be no larger size than src [24, 4162]

Purpose

Test Plan

Test Result

…put are enabled Signed-off-by: southfreebird <[email protected]>

gemini-code-assist

Code Review

This pull request addresses a critical bug that causes a RuntimeError when speculative decoding and structured output are used together with logit processors. The root cause is that stale speculative token data could persist in InputBatch if the scheduler drops all draft tokens for a request, leading to out-of-bounds errors in subsequent penalty calculations. The fix correctly ensures that InputBatch.spec_token_ids is always updated, even with an empty list of tokens, thus preventing state corruption. The change is logical, well-commented, and effectively resolves the issue. The implementation looks correct.

benchislett · 2025-10-10T13:27:38Z

vllm/v1/worker/gpu_model_runner.py

+            # meet the structural schema. This means that
+            # scheduler_output.scheduled_spec_decode_tokens might be empty,
+            # even when speculative decoding is enabled. So, we moved this line
+            # from the 'if' block above.


Please rephrase the comment so that it explains the state of the code and not the change to the code. Comments about moved lines can become less meaningful over time with refactoring

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: southfreebird <[email protected]>

vllm/v1/worker/gpu_model_runner.py

benchislett

LGTM, Thanks!

njhill · 2025-10-18T22:13:58Z

@southfreebird could you rebase on latest main?

…dec-and-structural-output

2 critical fixes that cannot be implemented as plugins: 1. Qwen3 tool parser fix (line 523): - Fixes missing opening brace in streaming tool calls - One-line fix: removed buggy condition - Upstreamable: Yes 2. Eagle rejection sampler fix (gpu_model_runner.py): - Cherry-picked from PR vllm-project#26586 (pending upstream merge) - Fixes RuntimeError with Eagle + penalties - Moved spec_token_ids assignment outside if block Plus minor fixes: - DeepSeek R1 reasoning parser import - Config __init__.py ordering See: IN_TREE_MODIFICATIONS.md for details Signed-off-by: Pradyun Ramadorai <[email protected]>

Merged 8 commits from origin/main including: - PR vllm-project#26586: Eagle rejection sampler fix (previously cherry-picked) - LoRA CUDA graph specialization (vllm-project#25914) - Bee-8B VLM model support (vllm-project#27012) - Utilities reorganization (network_utils, async_utils, etc.) - Multiple bug fixes and improvements In-Tree Modifications: - Removed Eagle rejection sampler cherry-pick (now in upstream) - Kept Qwen3 tool parser fix (still needed, line 523) - Only 1 active in-tree modification remaining Plugin Compatibility: - All 10 plugin patches load successfully - No target class changes required - Clean merge with no conflicts Documentation Updates: - Updated IN_TREE_MODIFICATIONS.md (moved Eagle fix to Removed/Obsolete) - Updated CLAUDE.md merge history - Verified clean diff with origin/main (3 files, all documented) Signed-off-by: Pradyun Ramadorai <[email protected]>

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]>

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]>

Fix error with penalties when speculative decoding and structural out…

e773ad5

…put are enabled Signed-off-by: southfreebird <[email protected]>

southfreebird requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners October 10, 2025 13:18

mergify bot added the v1 label Oct 10, 2025

gemini-code-assist bot reviewed Oct 10, 2025

View reviewed changes

benchislett reviewed Oct 10, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Show resolved Hide resolved

Fix comments + pre-commit

e4cc330

Signed-off-by: southfreebird <[email protected]>

benchislett reviewed Oct 16, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Show resolved Hide resolved

benchislett approved these changes Oct 16, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 18, 2025

njhill approved these changes Oct 18, 2025

View reviewed changes

njhill enabled auto-merge (squash) October 18, 2025 18:16

Merge branch 'vllm-project:main' into fix/logit-processors-with-spec-…

1c3d127

…dec-and-structural-output

njhill merged commit f6fdacd into vllm-project:main Oct 19, 2025
46 checks passed

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Bugfix] Fix error with penalties when speculative decoding and struc…

5f388e9

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]>

adabeyta pushed a commit to adabeyta/vllm that referenced this pull request Oct 20, 2025

[Bugfix] Fix error with penalties when speculative decoding and struc…

447e3d1

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]>

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[Bugfix] Fix error with penalties when speculative decoding and struc…

ad28c6e

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Bugfix] Fix error with penalties when speculative decoding and struc…

2761fe7

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

[Bugfix] Fix error with penalties when speculative decoding and struc…

4acccfc

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Bugfix] Fix error with penalties when speculative decoding and struc…

9067136

…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled #26586

[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled #26586

Uh oh!

southfreebird commented Oct 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

benchislett Oct 10, 2025

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Uh oh!

njhill commented Oct 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled #26586

[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled #26586

Uh oh!

Conversation

southfreebird commented Oct 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

benchislett Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Oct 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

southfreebird commented Oct 10, 2025 •

edited by github-actions bot

Loading