[Bugfix] Fix and add tests for GptOss reasoning parser#28000
[Bugfix] Fix and add tests for GptOss reasoning parser#28000yeqcharlotte merged 6 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Benjamin Chislett <[email protected]>
Signed-off-by: Benjamin Chislett <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in the GptOss reasoning parser, which previously struggled to correctly identify the end of a reasoning block when extra tokens were present between the final channel and the message tag. The implemented fix introduces a more flexible parsing mechanism that searches for a prefix and suffix separately within a defined token distance, which effectively resolves the issue and enhances parsing accuracy. The inclusion of a comprehensive unit test suite for is_reasoning_end is a valuable addition, as it thoroughly validates the new logic across different scenarios.
I have one suggestion to enhance the performance of the is_reasoning_end method, which could become a performance bottleneck with long input sequences.
| # The model can output some special tokens between "final" and "<|message|>" | ||
| # So we need to look for both sequences to determine the end of reasoning. | ||
| self.reasoning_end_token_ids_prefix = self.model_tokenizer.encode( | ||
| "<|start|>assistant<|channel|>final" |
There was a problem hiding this comment.
So the model will only output <|start|>assistant if it does decide to output a reasoning message, in some situations it may just jump straight to the final message, so it may be more reasonable to just do <|channel|>final instead as it will cover both situations.
To cover tool calling scenarios, as soon as you see to= in the header, then the message is a tool call and no longer reasoning.
Both situations can be covered by the ending suffix of <|message|> though.
There was a problem hiding this comment.
updated to cover the first case. I think it would be easiest for me to leave any tool calling updates to follow-up work, as I have no context as to how we do them in vLLM. I'm not entirely sure if even ever uses this code pathway since the work using structured tags
Signed-off-by: Benjamin Chislett <[email protected]>
alecsolder
left a comment
There was a problem hiding this comment.
Looks good to me, I have a PR ready soon as a follow up which will adjust this for tool calling as well
…28000) Signed-off-by: Benjamin Chislett <[email protected]>
Purpose
GPT-OSS can escape structured outputs by emitting tokens between
<|channel|>finaland<|message|>, for example:<|start|>assistant<|channel|>final <|constrain|>JSON<|message|>content....This PR updates the parser to look for
<|start|>assistant<|channel|>finaland then the suffix<|message|>. This doesn't cover all cases, such as tool calling (see #23120), but it does workaround a few common issues.Test Plan
Added a unit test for GptOssReasoningParser to target
is_reasoning_endspecifically. Future work can extend this to cover more cases as needed.Test Result
Tests now passing.
I also ran the benchmark script:
and see 98% accuracy before (same as with ratio set to 0.0. At least one confirmed failure case due to this feature gap), and 100% accuracy after this change.