-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output
#14624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
markmc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with any of this. I'd need to study V0 behavior before being confident in this approach, basically to answer my questions in comments. Have you compared to V0?
|
Not quite finished yet (need to add unit tests, refactor, address feedback) but opening for review since we are trying to finish ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @afeldman-nm.
My thought is similar to @markmc and @robertgshaw2-redhat's.
My suggestion would be:
- Pass a bool
stop_terminatedflag toupdate().. which when can be called withfinish_reason == STOP - Inside
update(), ifstop_terminated and self.include_stop_str_in_outputis True, skip detokenizing the last token innew_token_ids(which is typically the only token but might not be). We still want to add that token toself.token_idsbut don't need to do anything else for it that's inside the for loop.
I think that would also address @robertgshaw2-redhat's brittleness concerns.
An alternative suggestion would be to just have the detokenizer also handle the eos and stop token id evaluation like it does for stop strings, and move it out of the engine. The downside of this is that we would be generating an extra token in those cases (and eos is most common) so I'd probably still prefer to avoid it.
|
On second thoughts, maybe @robertgshaw2-redhat's suggestion is better, basically to do the eos and stop token id checking in the An annoying thing about this is that we don't need to send an abort and it would be preferable not to. But then we need to have additional conditional logic in the calling output processor method to only send the abort if the stop_reason is a |
Signed-off-by: Andrew Feldman <[email protected]>
I implemented roughly a compromise between your and Robert's suggestions - the last token skips detokenization if a stop token was detected, and the detokenizer's stop-token check validates that the last token is actually a valid EOS or stop token. |
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
|
JFYI, currently adding unit tests, still WIP |
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
|
Is this ready to merge? |
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afeldman-nm I think the changes can be much smaller, I don't think we need any new methods.
I've gone back to thinking my first suggestion would be better. I made the changes in a commit here to illustrate: njhill@bc26b30, feel free to pull them in.
Signed-off-by: Andrew Feldman <[email protected]>
Good restructuring @njhill . Minor nit, I incorporated Rob's suggestion (actually validating that the engine core stop was triggered by an EOS or an element of |
|
I think that this PR is /ready |
Signed-off-by: Andrew Feldman <[email protected]>
Thanks @afeldman-nm. IMHO though the extra validation (and associated additional complexity) here is unnecessary. Passing |
Signed-off-by: Andrew Feldman <[email protected]>
Okay @njhill , I made a change to reflect your suggestion. |
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afeldman-nm just a few more small things (sorry!)
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @afeldman-nm!
not include_stop_str_in_output
not include_stop_str_in_outputnot include_stop_str_in_output
vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Richard Liu <[email protected]>
vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]>
vllm-project#14624) Signed-off-by: Andrew Feldman <[email protected]> Signed-off-by: Mu Huai <[email protected]>
This PR plumbs the engine core
finish_reasoninto theincremental_detokenizer.update()method. The invariant is that iffinish_reason == STOPin the engine core output, the engine core must have detected a token-based stop condition (EOS or stop-token.) The detokenizer will truncate the most recent token's detokenized text from the output text.FIX #14623