Skip to content

Conversation

@southfreebird
Copy link
Contributor

@southfreebird southfreebird commented Oct 10, 2025

Fix an error that appears after #19482 when logit processors (such as penalties) are enabled together with speculative decoding and structural output. The example of the error:

File "/vllm/model_executor/layers/utils.py", line 45, in get_token_bin_counts_and_mask
     bin_counts.scatter_add_(1, tokens, torch.ones_like(tokens))
RuntimeError: Expected index [24, 4162] to be no larger than self [21, 201089] apart from dimension 1 and to be no larger size than src [24, 4162]

Purpose

Test Plan

Test Result

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug that causes a RuntimeError when speculative decoding and structured output are used together with logit processors. The root cause is that stale speculative token data could persist in InputBatch if the scheduler drops all draft tokens for a request, leading to out-of-bounds errors in subsequent penalty calculations. The fix correctly ensures that InputBatch.spec_token_ids is always updated, even with an empty list of tokens, thus preventing state corruption. The change is logical, well-commented, and effectively resolves the issue. The implementation looks correct.

# meet the structural schema. This means that
# scheduler_output.scheduled_spec_decode_tokens might be empty,
# even when speculative decoding is enabled. So, we moved this line
# from the 'if' block above.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rephrase the comment so that it explains the state of the code and not the change to the code. Comments about moved lines can become less meaningful over time with refactoring

Signed-off-by: southfreebird <[email protected]>
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 18, 2025
@njhill njhill enabled auto-merge (squash) October 18, 2025 18:16
@njhill
Copy link
Member

njhill commented Oct 18, 2025

@southfreebird could you rebase on latest main?

@njhill njhill merged commit f6fdacd into vllm-project:main Oct 19, 2025
46 checks passed
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Oct 20, 2025
2 critical fixes that cannot be implemented as plugins:

1. Qwen3 tool parser fix (line 523):
   - Fixes missing opening brace in streaming tool calls
   - One-line fix: removed buggy condition
   - Upstreamable: Yes

2. Eagle rejection sampler fix (gpu_model_runner.py):
   - Cherry-picked from PR vllm-project#26586 (pending upstream merge)
   - Fixes RuntimeError with Eagle + penalties
   - Moved spec_token_ids assignment outside if block

Plus minor fixes:
- DeepSeek R1 reasoning parser import
- Config __init__.py ordering

See: IN_TREE_MODIFICATIONS.md for details

Signed-off-by: Pradyun Ramadorai <[email protected]>
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Oct 20, 2025
Merged 8 commits from origin/main including:
- PR vllm-project#26586: Eagle rejection sampler fix (previously cherry-picked)
- LoRA CUDA graph specialization (vllm-project#25914)
- Bee-8B VLM model support (vllm-project#27012)
- Utilities reorganization (network_utils, async_utils, etc.)
- Multiple bug fixes and improvements

In-Tree Modifications:
- Removed Eagle rejection sampler cherry-pick (now in upstream)
- Kept Qwen3 tool parser fix (still needed, line 523)
- Only 1 active in-tree modification remaining

Plugin Compatibility:
- All 10 plugin patches load successfully
- No target class changes required
- Clean merge with no conflicts

Documentation Updates:
- Updated IN_TREE_MODIFICATIONS.md (moved Eagle fix to Removed/Obsolete)
- Updated CLAUDE.md merge history
- Verified clean diff with origin/main (3 files, all documented)

Signed-off-by: Pradyun Ramadorai <[email protected]>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
adabeyta pushed a commit to adabeyta/vllm that referenced this pull request Oct 20, 2025
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <[email protected]>
Signed-off-by: Alberto Perdomo <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…tural output are enabled (vllm-project#26586)

Signed-off-by: southfreebird <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants