[Responses API] Structured output + reasoning via structural tag embedding#35873
Closed
will-deines wants to merge 3 commits intovllm-project:mainfrom
Closed
[Responses API] Structured output + reasoning via structural tag embedding#35873will-deines wants to merge 3 commits intovllm-project:mainfrom
will-deines wants to merge 3 commits intovllm-project:mainfrom
Conversation
…dding When a user requests JSON schema enforcement (text.format.type=json_schema) with a reasoning model (GPT-OSS), the grammar constraint was never scoped to the final output channel. This caused grammar bitmasks to be applied from token 0, clobbering reasoning output. Fix by embedding content constraints (json_schema, json_object, regex, grammar, choice) inside the structural tag's <|channel|>final region using xgrammar's native TriggeredTagsFormat support. This ensures grammar enforcement only applies within the final output region, not during reasoning. Also: - Handle text.format.type=json_object (was silently ignored) - Fix streaming + json_schema alias bug (.model_dump() dropped schema alias) - Apply reasoning channel tags even when no structured output is requested
Contributor
There was a problem hiding this comment.
Code Review
This pull request enhances structured output capabilities, especially for reasoning models, by embedding content constraints within structural tags. It also adds support for json_object format and fixes a streaming bug. My review found one high-severity issue where options for structured output generation were being dropped when combining with reasoning. I've suggested a fix to preserve these options.
…nstraints When creating a new StructuredOutputsParams with the structural_tag, use dataclasses.replace() to clear content constraint fields while preserving user-specified options like disable_any_whitespace, disable_fallback, disable_additional_properties, and whitespace_pattern.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
text.format.type=json_schema) with a GPT-OSS reasoning model, the grammar constraint is now scoped to the<|channel|>finalregion via xgrammar'sTriggeredTagsFormat. Previously, grammar bitmasks were applied from token 0, clobbering reasoning output.json_objectformat:text.format.type=json_objectwas silently ignored in the Responses API. Now producesStructuredOutputsParams(json_object=True), matching chat completions behavior..model_dump()in the streaming path that dropped theschema→schema_Pydantic alias, causingResponseCreatedEventdeserialization failures.struct_out is Nonebranch).Approach
Rather than modifying
StructuredOutputsParamsto allow multiple simultaneous constraint types (which would require deep changes to validation, backends, and dispatch), we embed the content constraint inside the structural tag's<|channel|>finaltag.xgrammar's
TagFormat.contentfield already accepts a discriminated union ofJSONSchemaFormat,GrammarFormat,RegexFormat, etc. (defined inxgrammar/structural_tag.py). The infrastructure to "apply JSON schema grammar only within the<|channel|>finalregion" already exists — we just wire it up from the Responses API.This means:
StructuredOutputsParamskeeps its existing mutual-exclusivity invariant (one constraint type)structural_tag, which internally contains both reasoning channel enforcement AND the content constraint scoped to the final channeldisable_any_whitespace,disable_fallback, etc.) are preserved viadataclasses.replace()Decisions We Made That Can Be Debated
1. Embed constraint inside structural tag vs. allow multiple constraint types on
StructuredOutputsParamsWhat we chose: When a reasoning parser is active and a content constraint (json_schema, regex, grammar, choice) is present, we convert the content constraint into an xgrammar
contentformat dict, embed it in the<|channel|>finaltag within the structural tag, then clear the original constraint fields. The finalStructuredOutputsParamshas onlystructural_tagset.Alternative: Modify
StructuredOutputsParamsto support multiple simultaneous constraint types (e.g.structural_tag+json). This would avoid the mid-pipeline mutation pattern where we clear fields after embedding them, but requires changes to validation logic, backend dispatch inStructuredOutputManager, and every guided decoding backend's understanding of what "one constraint" means.Why we chose this: xgrammar's
TagFormat.contentfield already supports this composition natively — the infrastructure exists and is tested. The mutual-exclusivity invariant onStructuredOutputsParamsis load-bearing across the entire structured output stack, and relaxing it has a large blast radius.What reviewers might disagree with: The mid-pipeline mutation (clearing
json/regex/etc. after embedding) meansStructuredOutputsParamsno longer reflects what the user originally requested. If downstream code inspects these fields (e.g., for logging, metrics, or error messages), it will seeNoneinstead of the original constraint. An alternative could be to construct a freshStructuredOutputsParams(structural_tag=...)rather than mutating viadataclasses.replace().2. Fix
text.formatpath rather than redirecting users tostructured_outputsfieldWhat we chose: We fix the standard OpenAI
text.formatpath so thatjson_schema,json_object, and streaming all work correctly. Users can use eithertext.format(OpenAI-compatible) or the vLLM-specificstructured_outputsfield (#33709).Alternative: Only support structured output through the vLLM-specific
structured_outputsfield and treattext.formatas a passthrough/echo-only field (the status quo before this PR, wherejson_objectwas silently ignored).Context: This is an area of active debate. In #33709, @yeqcharlotte and @chaunceyjiang questioned why structured output wasn't going through
text.formatinstead of a separate field. In #33381, @chaunceyjiang argued vLLM-specific extensions should go through the OpenResponses extension mechanism. Meanwhile, @alecsolder defended the separate field for cross-provider reusability and separation of concerns. In #19097,vllm_-prefixed types were proposed but the RFC was auto-closed without implementation.Why we chose this: Users coming from the OpenAI SDK will naturally use
text.format.type=json_schema— it should just work. Thestructured_outputsfield is additive for vLLM-specific capabilities (grammar, regex, choice) thattext.formatcan't express. Fixing both paths costs little and prevents user confusion.3. Remove
.model_dump()vs. addby_alias=Truefor streaming alias bugWhat we chose: Remove the
.model_dump()call in the streaming path and pass theResponsesResponsePydantic object directly toResponseCreatedEvent, matching howResponseCompletedEventalready works. This is the approach from #34611.Alternative: Keep
.model_dump()but addby_alias=Trueso Pydantic serializesschema_as"schema". This is the approach from #26356, which has community confirmation of working.Why we chose this: Removing the unnecessary dict round-trip eliminates the entire class of alias bugs rather than patching one instance. This is consistent with @qandrew's own #26185 which previously removed a
.model_dump()call on theResponseCompletedEventpath for the same category of issue. Theby_alias=Trueapproach is fragile — any future alias field would break again if someone forgets the flag.4. Apply reasoning channel tags even when no structured output is requested
What we chose: When
struct_out is Noneand a reasoning parser is active, we now create aStructuredOutputsParams(structural_tag=...)with just the reasoning channel tags. Previously, theprepare_structured_tag()block was only entered whenstruct_outwas already aStructuredOutputsParamsinstance.Alternative: Keep the existing behavior where reasoning channel tags are only applied when the user explicitly requests some form of structured output.
Why we chose this: Without structural tags, GPT-OSS models emit raw Harmony format (
<|channel|>analysis<|message|>...) that the reasoning parser must post-hoc parse. With structural tags, xgrammar enforces the channel structure at decode time, which is more robust and enables future optimizations. This also means the reasoning parser'sis_reasoning_endstate machine (which has had multi-turn bugs per #34454) is supplemented by grammar-level enforcement.What reviewers might disagree with: This changes default behavior for all GPT-OSS requests that don't request structured output. If a model produces valid output without structural tags but would be over-constrained with them, this could cause regressions. We don't have e2e validation of this path yet.
5.
json_objectmapped to{"type": "object"}in structural tag contentWhat we chose: In
_constraint_to_content_format(),json_object=Trueis converted to{"type": "json_schema", "json_schema": {"type": "object"}}for embedding in the structural tag.Alternative: Map it to a dedicated
json_objectcontent format type if xgrammar supports one, or skip embedding entirely and let the existingjson_objecthandling in the structured output backend handle it outside the structural tag.Why we chose this: xgrammar's
TagFormat.contentexpects one of its known format types (json_schema, regex, grammar, etc.).{"type": "object"}is the minimal JSON schema that enforces "output must be a JSON object" — semantically equivalent tojson_objectmode. This ensures the constraint is properly scoped to the<|channel|>finalregion for reasoning models rather than being applied globally.6. Adding
final_content_formatparameter to the base classprepare_structured_tag()What we chose: We added
final_content_format: dict | None = Noneas an optional parameter onReasoningParser.prepare_structured_tag()in the base class, with a default ofNonethat preserves backward compatibility.Alternative: Only add the parameter on
GPTOSSReasoningParserand handle the dispatch inserving.pywith a type check or capability flag. Or create a separate method likeprepare_structured_tag_with_constraint().Why we chose this: The base class change is backward-compatible (default
None, existing implementations don't need changes). The concept of "scope this content constraint to the model's final output region" is generic — it's not GPT-OSS-specific. Other reasoning models (Qwen3, DeepSeek-R1, future models) with structural tag support would benefit from the same interface. Keeping it on the base class establishes a clean contract.What reviewers might disagree with: This couples content constraint format knowledge (xgrammar dict format) to the reasoning parser interface. If vLLM ever supports a non-xgrammar structured output backend, this dict format may not apply. A more abstract interface (e.g., passing
StructuredOutputsParamsdirectly) might be more future-proof.Related Issues, PRs, and RFCs
Directly Addressed by This PR
schemafield becomesNonein streaming with json_schemaschema_/schemaalias bug in streaming. The.model_dump()removal in this PR fixes it..model_dump()in streaming path. We adopt this approach.by_alias=True). We prefer #34611's approach (pass objects directly).texttype response_format receivedtype: "text"passthrough. Ourjson_objecthandling follows the same pattern.json_objectgap in the Responses API could produce similar errors. Our Step 1 prevents this.Foundation This PR Builds On
structured_outputsfor responses APIstructured_outputsfield to ResponsesRequest. Our work builds on this.to_sampling_params()infrastructure on ResponsesRequest.Parser/ParserManagerand the structural tag preparation block we're extending.Related RFCs
structured_outputsfield. We reuseStructuredOutputsParamsrather than creating new types.structured_outputsfield (already merged in #33709) and also fix the standardtext.formatpath. No new protocol extensions.structured_outputsas instance field on ResponsesRequeststructured_outputsfrom local var to field for tool parser mutation. Compatible with our changes; we support both the field path and thetext.formatpath.Changes
vllm/entrypoints/openai/responses/protocol.pyjson_objecthandling into_sampling_params()vllm/entrypoints/openai/responses/serving.py_constraint_to_content_format()helper; rewire structural tag preparation block; fix streaming.model_dump()vllm/reasoning/abs_reasoning_parsers.pyfinal_content_formatparam toprepare_structured_tag()base classvllm/reasoning/gptoss_reasoning_parser.pyfinal_content_format— append `<tests/entrypoints/openai/responses/test_structured_output.py_constraint_to_content_formattests/v1/structured_output/test_gptoss_structural_tags.pytests/entrypoints/openai/responses/test_sampling_params.pyjson_objecttest.gitignorelocal_test/Test plan
pytest tests/entrypoints/openai/responses/test_structured_output.py tests/v1/structured_output/test_gptoss_structural_tags.py tests/entrypoints/openai/responses/test_sampling_params.py -vOut of Scope (follow-ups)
serving_chat.pynever callsprepare_structured_tag(). Same fix pattern applies but is a separate PR targeting the chat completions path.strictfield forwarding fromResponseFormatTextJSONSchemaConfig— low priority, vLLM always enforces strictly.structured_outputsshould go through extension mechanism.structured_outputsas instance field (#33249) — promotesstructured_outputsfrom local variable to field for tool parser mutation. Compatible with our changes but independent concern.