Skip to content

Conversation

@wallashss
Copy link
Collaborator

Description

This PR fixes the update of logits processors that need to be updated at each engine step. To validate the change, I updated the existing test for min tokens where we can identify the wrong behaviour. Note: the bug is reproducible in both CB and SB.

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

# after min tokens reached the logits processor is properly
# cleared.
assert len(output1.outputs[0].token_ids) < 20
assert len(output2.outputs[0].token_ids) < 10
Copy link
Collaborator

@tjohnson31415 tjohnson31415 Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If increase the eos_id logit bias to force it to be generated then we can assert on the exact output length, right?

    assert len(output1.outputs[0].token_ids) == 11
    assert len(output2.outputs[0].token_ids) == 1

(the values for those asserts may be off-by-one depending on how EOS is tracked in the outputs 😅)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP, PTAL

Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Copy link
Collaborator

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@tjohnson31415 tjohnson31415 enabled auto-merge (squash) October 27, 2025 20:43
@github-actions github-actions bot added the ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage label Oct 27, 2025
@wallashss
Copy link
Collaborator Author

bot:test

@tjohnson31415 tjohnson31415 merged commit 7ed0611 into main Oct 27, 2025
30 of 40 checks passed
@tjohnson31415 tjohnson31415 deleted the wallas-fix-min-tokens branch October 27, 2025 20:58
tjohnson31415 added a commit that referenced this pull request Oct 29, 2025
…545)

# Description

The MinTokensLogitsProcessor needs to get a `batch_update` at each step
to detect when enough tokens have been generated. The
`LogitProcessorWrapper` copied the typical logic of skipping updates
when batch_update is None, but this meant that min tokens would not get
the needed call to `update_state`.

The fix here is to always call `update_state` on each of the wrapped
logitsprocs in the batch, with a some extra code to not call
`update_state` for a particular index more than once.

## Related Issues

Follow up to #544 which
fixed the behavior for static batching
Cherry-picked improvement to test_sampling_params.py from
#536

---------

Signed-off-by: Travis Johnson <[email protected]>
Co-authored-by: Wallas Santos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants