Skip to content

[Bugfix] Fix and add tests for GptOss reasoning parser#28000

Merged
yeqcharlotte merged 6 commits intovllm-project:mainfrom
CentML:bugfix-gptoss-reasoning-parser
Nov 7, 2025
Merged

[Bugfix] Fix and add tests for GptOss reasoning parser#28000
yeqcharlotte merged 6 commits intovllm-project:mainfrom
CentML:bugfix-gptoss-reasoning-parser

Conversation

@benchislett
Copy link
Copy Markdown
Collaborator

@benchislett benchislett commented Nov 3, 2025

Purpose

GPT-OSS can escape structured outputs by emitting tokens between <|channel|>final and <|message|>, for example: <|start|>assistant<|channel|>final <|constrain|>JSON<|message|>content....

This PR updates the parser to look for <|start|>assistant<|channel|>final and then the suffix <|message|>. This doesn't cover all cases, such as tool calling (see #23120), but it does workaround a few common issues.

Test Plan

Added a unit test for GptOssReasoningParser to target is_reasoning_end specifically. Future work can extend this to cover more cases as needed.

Test Result

Tests now passing.

I also ran the benchmark script:

python3 ../benchmarks/benchmark_serving_structured_output.py --backend openai-chat --model openai/gpt-oss-120b --dataset xgrammar_bench --structured-output-ratio 1.0 --request-rate 100 --num-prompts 100 --port 8046 --endpoint /openai/v1/chat/completions --output-len 900 --save-results

and see 98% accuracy before (same as with ratio set to 0.0. At least one confirmed failure case due to this feature gap), and 100% accuracy after this change.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the GptOss reasoning parser, which previously struggled to correctly identify the end of a reasoning block when extra tokens were present between the final channel and the message tag. The implemented fix introduces a more flexible parsing mechanism that searches for a prefix and suffix separately within a defined token distance, which effectively resolves the issue and enhances parsing accuracy. The inclusion of a comprehensive unit test suite for is_reasoning_end is a valuable addition, as it thoroughly validates the new logic across different scenarios.

I have one suggestion to enhance the performance of the is_reasoning_end method, which could become a performance bottleneck with long input sequences.

@benchislett benchislett added the bug Something isn't working label Nov 3, 2025
# The model can output some special tokens between "final" and "<|message|>"
# So we need to look for both sequences to determine the end of reasoning.
self.reasoning_end_token_ids_prefix = self.model_tokenizer.encode(
"<|start|>assistant<|channel|>final"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the model will only output <|start|>assistant if it does decide to output a reasoning message, in some situations it may just jump straight to the final message, so it may be more reasonable to just do <|channel|>final instead as it will cover both situations.

To cover tool calling scenarios, as soon as you see to= in the header, then the message is a tool call and no longer reasoning.

Both situations can be covered by the ending suffix of <|message|> though.

Copy link
Copy Markdown
Collaborator Author

@benchislett benchislett Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to cover the first case. I think it would be easiest for me to leave any tool calling updates to follow-up work, as I have no context as to how we do them in vLLM. I'm not entirely sure if even ever uses this code pathway since the work using structured tags

Signed-off-by: Benjamin Chislett <[email protected]>
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 5, 2025
Copy link
Copy Markdown
Contributor

@alecsolder alecsolder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I have a PR ready soon as a follow up which will adjust this for tool calling as well

@yeqcharlotte yeqcharlotte enabled auto-merge (squash) November 6, 2025 07:47
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025
@yeqcharlotte yeqcharlotte merged commit 1890321 into vllm-project:main Nov 7, 2025
45 checks passed
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
@benchislett benchislett deleted the bugfix-gptoss-reasoning-parser branch January 6, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants