Skip to content

Conversation

@afeldman-nm
Copy link
Contributor

@afeldman-nm afeldman-nm commented Mar 11, 2025

This PR plumbs the engine core finish_reason into the incremental_detokenizer.update() method. The invariant is that if finish_reason == STOP in the engine core output, the engine core must have detected a token-based stop condition (EOS or stop-token.) The detokenizer will truncate the most recent token's detokenized text from the output text.

FIX #14623

Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Mar 11, 2025
@afeldman-nm afeldman-nm changed the title [V1 [V1] Detokenizer is not truncating output text in response to token-based stops detected in engine core; breaks tool call unit tests Mar 11, 2025
@robertgshaw2-redhat robertgshaw2-redhat changed the title [V1] Detokenizer is not truncating output text in response to token-based stops detected in engine core; breaks tool call unit tests [V1] Detokenizer: Respect Stop Tokens Mar 11, 2025
Copy link
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with any of this. I'd need to study V0 behavior before being confident in this approach, basically to answer my questions in comments. Have you compared to V0?

@afeldman-nm afeldman-nm marked this pull request as ready for review March 11, 2025 17:15
@afeldman-nm
Copy link
Contributor Author

Not quite finished yet (need to add unit tests, refactor, address feedback) but opening for review since we are trying to finish ASAP.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm.

My thought is similar to @markmc and @robertgshaw2-redhat's.

My suggestion would be:

  • Pass a bool stop_terminated flag to update() .. which when can be called with finish_reason == STOP
  • Inside update(), if stop_terminated and self.include_stop_str_in_output is True, skip detokenizing the last token in new_token_ids (which is typically the only token but might not be). We still want to add that token to self.token_ids but don't need to do anything else for it that's inside the for loop.

I think that would also address @robertgshaw2-redhat's brittleness concerns.


An alternative suggestion would be to just have the detokenizer also handle the eos and stop token id evaluation like it does for stop strings, and move it out of the engine. The downside of this is that we would be generating an extra token in those cases (and eos is most common) so I'd probably still prefer to avoid it.

@njhill
Copy link
Member

njhill commented Mar 11, 2025

On second thoughts, maybe @robertgshaw2-redhat's suggestion is better, basically to do the eos and stop token id checking in the update method and handle selective detokenization of the final token accordingly. But also leave it in the engine.

An annoying thing about this is that we don't need to send an abort and it would be preferable not to. But then we need to have additional conditional logic in the calling output processor method to only send the abort if the stop_reason is a str (not if it's an int or None)

Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm
Copy link
Contributor Author

On second thoughts, maybe @robertgshaw2-redhat's suggestion is better, basically to do the eos and stop token id checking in the update method and handle selective detokenization of the final token accordingly. But also leave it in the engine.

An annoying thing about this is that we don't need to send an abort and it would be preferable not to. But then we need to have additional conditional logic in the calling output processor method to only send the abort if the stop_reason is a str (not if it's an int or None)

I implemented roughly a compromise between your and Robert's suggestions - the last token skips detokenization if a stop token was detected, and the detokenizer's stop-token check validates that the last token is actually a valid EOS or stop token.

Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm
Copy link
Contributor Author

JFYI, currently adding unit tests, still WIP

Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
@robertgshaw2-redhat
Copy link
Collaborator

Is this ready to merge?

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.

I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.

@njhill njhill added this to the v0.8.0 milestone Mar 13, 2025
@njhill njhill added the bug Something isn't working label Mar 13, 2025
@afeldman-nm
Copy link
Contributor Author

@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.

I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.

Good restructuring @njhill . Minor nit, I incorporated Rob's suggestion (actually validating that the engine core stop was triggered by an EOS or an element of stop_token_ids) into your commit; this restructures things slightly but overall I was able to avoid adding any helper functions and maintain simplicity.

@afeldman-nm
Copy link
Contributor Author

I think that this PR is /ready

Signed-off-by: Andrew Feldman <[email protected]>
@njhill
Copy link
Member

njhill commented Mar 13, 2025

@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.
I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.

Good restructuring @njhill . Minor nit, I incorporated Rob's suggestion (actually validating that the engine core stop was triggered by an EOS or an element of stop_token_ids) into your commit; this restructures things slightly but overall I was able to avoid adding any helper functions and maintain simplicity.

Thanks @afeldman-nm. IMHO though the extra validation (and associated additional complexity) here is unnecessary. Passing stop_terminated=True to that method is telling it that the request is stopping due to the last token being a stop token (which in turn has been communicated via STOP finish reason returned by the engine). So it seems a bit arbitrary to re-check. I really think we should keep it simpler.

@afeldman-nm
Copy link
Contributor Author

@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.
I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.

Good restructuring @njhill . Minor nit, I incorporated Rob's suggestion (actually validating that the engine core stop was triggered by an EOS or an element of stop_token_ids) into your commit; this restructures things slightly but overall I was able to avoid adding any helper functions and maintain simplicity.

Thanks @afeldman-nm. IMHO though the extra validation (and associated additional complexity) here is unnecessary. Passing stop_terminated=True to that method is telling it that the request is stopping due to the last token being a stop token (which in turn has been communicated via STOP finish reason returned by the engine). So it seems a bit arbitrary to re-check. I really think we should keep it simpler.

Okay @njhill , I made a change to reflect your suggestion.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afeldman-nm just a few more small things (sorry!)

Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm!

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2025
@njhill njhill changed the title [V1] Detokenizer: Respect Stop Tokens [V1] Detokenizer: Respect Stop Tokens with not include_stop_str_in_output Mar 13, 2025
@njhill njhill changed the title [V1] Detokenizer: Respect Stop Tokens with not include_stop_str_in_output [V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output Mar 13, 2025
@njhill njhill enabled auto-merge (squash) March 13, 2025 17:12
@njhill njhill merged commit 02fcaa3 into vllm-project:main Mar 13, 2025
45 checks passed
@afeldman-nm afeldman-nm deleted the afeldman-nm/tool branch March 13, 2025 19:30
richardsliu pushed a commit to richardsliu/vllm that referenced this pull request Mar 14, 2025
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: [V1] Detokenizer does not trunctate EOS/stop-token from output text

4 participants