[V1] Detokenizer: Respect Stop Tokens + `not include_stop_str_in_output` #14624

afeldman-nm · 2025-03-11T15:52:31Z

This PR plumbs the engine core finish_reason into the incremental_detokenizer.update() method. The invariant is that if finish_reason == STOP in the engine core output, the engine core must have detected a token-based stop condition (EOS or stop-token.) The detokenizer will truncate the most recent token's detokenized text from the output text.

FIX #14623

Signed-off-by: Andrew Feldman <[email protected]>

github-actions · 2025-03-11T15:52:43Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/v1/core/scheduler.py

markmc

I'm not super familiar with any of this. I'd need to study V0 behavior before being confident in this approach, basically to answer my questions in comments. Have you compared to V0?

vllm/v1/engine/detokenizer.py

vllm/v1/core/scheduler.py

vllm/v1/engine/detokenizer.py

afeldman-nm · 2025-03-11T17:16:06Z

Not quite finished yet (need to add unit tests, refactor, address feedback) but opening for review since we are trying to finish ASAP.

njhill

Thanks @afeldman-nm.

My thought is similar to @markmc and @robertgshaw2-redhat's.

My suggestion would be:

Pass a bool stop_terminated flag to update() .. which when can be called with finish_reason == STOP
Inside update(), if stop_terminated and self.include_stop_str_in_output is True, skip detokenizing the last token in new_token_ids (which is typically the only token but might not be). We still want to add that token to self.token_ids but don't need to do anything else for it that's inside the for loop.

I think that would also address @robertgshaw2-redhat's brittleness concerns.

An alternative suggestion would be to just have the detokenizer also handle the eos and stop token id evaluation like it does for stop strings, and move it out of the engine. The downside of this is that we would be generating an extra token in those cases (and eos is most common) so I'd probably still prefer to avoid it.

njhill · 2025-03-11T17:53:59Z

On second thoughts, maybe @robertgshaw2-redhat's suggestion is better, basically to do the eos and stop token id checking in the update method and handle selective detokenization of the final token accordingly. But also leave it in the engine.

An annoying thing about this is that we don't need to send an abort and it would be preferable not to. But then we need to have additional conditional logic in the calling output processor method to only send the abort if the stop_reason is a str (not if it's an int or None)

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm · 2025-03-11T19:13:25Z

On second thoughts, maybe @robertgshaw2-redhat's suggestion is better, basically to do the eos and stop token id checking in the update method and handle selective detokenization of the final token accordingly. But also leave it in the engine.

An annoying thing about this is that we don't need to send an abort and it would be preferable not to. But then we need to have additional conditional logic in the calling output processor method to only send the abort if the stop_reason is a str (not if it's an int or None)

I implemented roughly a compromise between your and Robert's suggestions - the last token skips detokenization if a stop token was detected, and the detokenizer's stop-token check validates that the last token is actually a valid EOS or stop token.

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm · 2025-03-11T21:26:42Z

JFYI, currently adding unit tests, still WIP

Signed-off-by: Andrew Feldman <[email protected]>

robertgshaw2-redhat · 2025-03-13T00:16:44Z

Is this ready to merge?

njhill

@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.

I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.

vllm/v1/engine/detokenizer.py

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm · 2025-03-13T05:31:14Z

@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.

I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.

Good restructuring @njhill . Minor nit, I incorporated Rob's suggestion (actually validating that the engine core stop was triggered by an EOS or an element of stop_token_ids) into your commit; this restructures things slightly but overall I was able to avoid adding any helper functions and maintain simplicity.

afeldman-nm · 2025-03-13T06:10:47Z

I think that this PR is /ready

Signed-off-by: Andrew Feldman <[email protected]>

njhill · 2025-03-13T12:52:52Z

@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.
I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.

Good restructuring @njhill . Minor nit, I incorporated Rob's suggestion (actually validating that the engine core stop was triggered by an EOS or an element of stop_token_ids) into your commit; this restructures things slightly but overall I was able to avoid adding any helper functions and maintain simplicity.

Thanks @afeldman-nm. IMHO though the extra validation (and associated additional complexity) here is unnecessary. Passing stop_terminated=True to that method is telling it that the request is stopping due to the last token being a stop token (which in turn has been communicated via STOP finish reason returned by the engine). So it seems a bit arbitrary to re-check. I really think we should keep it simpler.

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm · 2025-03-13T15:06:48Z

@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.
I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.

Good restructuring @njhill . Minor nit, I incorporated Rob's suggestion (actually validating that the engine core stop was triggered by an EOS or an element of stop_token_ids) into your commit; this restructures things slightly but overall I was able to avoid adding any helper functions and maintain simplicity.

Thanks @afeldman-nm. IMHO though the extra validation (and associated additional complexity) here is unnecessary. Passing stop_terminated=True to that method is telling it that the request is stopping due to the last token being a stop token (which in turn has been communicated via STOP finish reason returned by the engine). So it seems a bit arbitrary to re-check. I really think we should keep it simpler.

Okay @njhill , I made a change to reflect your suggestion.

njhill

@afeldman-nm just a few more small things (sorry!)

vllm/version.py

vllm/v1/engine/detokenizer.py

Signed-off-by: Andrew Feldman <[email protected]>

njhill

Thanks @afeldman-nm!

vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Richard Liu <[email protected]>

vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]>

vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Mu Huai <[email protected]>

afeldman-nm added 2 commits March 11, 2025 15:41

stop-token fix

f2d934b

Signed-off-by: Andrew Feldman <[email protected]>

cleanup

7b6548d

Signed-off-by: Andrew Feldman <[email protected]>

mergify bot added the v1 label Mar 11, 2025

afeldman-nm changed the title ~~[V1~~ [V1] Detokenizer is not truncating output text in response to token-based stops detected in engine core; breaks tool call unit tests Mar 11, 2025

robertgshaw2-redhat changed the title ~~[V1] Detokenizer is not truncating output text in response to token-based stops detected in engine core; breaks tool call unit tests~~ [V1] Detokenizer: Respect Stop Tokens Mar 11, 2025

robertgshaw2-redhat reviewed Mar 11, 2025

View reviewed changes

vllm/v1/core/scheduler.py Outdated Show resolved Hide resolved

markmc reviewed Mar 11, 2025

View reviewed changes

vllm/v1/engine/detokenizer.py Outdated Show resolved Hide resolved

vllm/v1/engine/detokenizer.py Outdated Show resolved Hide resolved

vllm/v1/core/scheduler.py Outdated Show resolved Hide resolved

robertgshaw2-redhat reviewed Mar 11, 2025

View reviewed changes

vllm/v1/engine/detokenizer.py Outdated Show resolved Hide resolved

afeldman-nm marked this pull request as ready for review March 11, 2025 17:15

afeldman-nm requested review from WoosukKwon, alexm-redhat, comaniac and ywang96 as code owners March 11, 2025 17:15

njhill reviewed Mar 11, 2025

View reviewed changes

wip

beef4f8

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm added 4 commits March 11, 2025 19:15

refactor

1e88c84

Signed-off-by: Andrew Feldman <[email protected]>

refactor

0d0b713

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into tool

df8fa0d

wip on adding unit tests

876b7f8

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm added 4 commits March 12, 2025 22:55

passing stop token test

77f0d35

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into tool

b0dbeb1

refactor

0a4e10f

Signed-off-by: Andrew Feldman <[email protected]>

refactor

7aee185

Signed-off-by: Andrew Feldman <[email protected]>

njhill reviewed Mar 13, 2025

View reviewed changes

vllm/v1/engine/detokenizer.py Outdated Show resolved Hide resolved

njhill added this to the v0.8.0 milestone Mar 13, 2025

njhill added the bug Something isn't working label Mar 13, 2025

afeldman-nm added 3 commits March 13, 2025 04:40

Merge branch 'main' into tool

2777029

nick changes

a9be86c

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into tool

adc6806

formatting

c5cc2d2

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm added 2 commits March 13, 2025 14:45

Merge branch 'main' into tool_merge

7f3beed

trusting engine core to identify stop token

77e9480

Signed-off-by: Andrew Feldman <[email protected]>

njhill reviewed Mar 13, 2025

View reviewed changes

vllm/version.py Outdated Show resolved Hide resolved

vllm/v1/engine/detokenizer.py Outdated Show resolved Hide resolved

vllm/v1/engine/detokenizer.py Outdated Show resolved Hide resolved

vllm/v1/engine/detokenizer.py Outdated Show resolved Hide resolved

afeldman-nm added 3 commits March 13, 2025 15:21

removed version fix

ca131ad

Signed-off-by: Andrew Feldman <[email protected]>

removed IncrementalDetokenizer fields

b3ab2ac

Signed-off-by: Andrew Feldman <[email protected]>

restructuring

8c9f802

Signed-off-by: Andrew Feldman <[email protected]>

njhill approved these changes Mar 13, 2025

View reviewed changes

Merge branch 'main' into tool_merge

af455b1

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2025

njhill changed the title ~~[V1] Detokenizer: Respect Stop Tokens~~ [V1] Detokenizer: Respect Stop Tokens with not include_stop_str_in_output Mar 13, 2025

njhill changed the title ~~[V1] Detokenizer: Respect Stop Tokens with not include_stop_str_in_output~~ [V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output Mar 13, 2025

njhill enabled auto-merge (squash) March 13, 2025 17:12

njhill merged commit 02fcaa3 into vllm-project:main Mar 13, 2025
45 checks passed

afeldman-nm deleted the afeldman-nm/tool branch March 13, 2025 19:30

richardsliu pushed a commit to richardsliu/vllm that referenced this pull request Mar 14, 2025

[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (

eae482f

vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Richard Liu <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (

7fb4940

vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (

f204bde

vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (

4458f73

vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Uh oh!

[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output #14624

[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output #14624

Conversation

afeldman-nm commented Mar 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 11, 2025

Uh oh!

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afeldman-nm commented Mar 11, 2025

Uh oh!

njhill left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njhill commented Mar 11, 2025

Uh oh!

afeldman-nm commented Mar 11, 2025

Uh oh!

afeldman-nm commented Mar 11, 2025

Uh oh!

robertgshaw2-redhat commented Mar 13, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

afeldman-nm commented Mar 13, 2025

Uh oh!

afeldman-nm commented Mar 13, 2025

Uh oh!

njhill commented Mar 13, 2025

Uh oh!

afeldman-nm commented Mar 13, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[V1] Detokenizer: Respect Stop Tokens + `not include_stop_str_in_output` #14624

[V1] Detokenizer: Respect Stop Tokens + `not include_stop_str_in_output` #14624

afeldman-nm commented Mar 11, 2025 •

edited by github-actions bot

Loading

njhill left a comment •

edited

Loading