Skip to content

[Responses API] Structured output + reasoning via structural tag embedding#35873

Closed
will-deines wants to merge 3 commits intovllm-project:mainfrom
will-deines:worktree-responses-structured-output
Closed

[Responses API] Structured output + reasoning via structural tag embedding#35873
will-deines wants to merge 3 commits intovllm-project:mainfrom
will-deines:worktree-responses-structured-output

Conversation

@will-deines
Copy link
Copy Markdown

@will-deines will-deines commented Mar 3, 2026

Summary

  • Embed content constraints in structural tags: When a user requests JSON schema enforcement (text.format.type=json_schema) with a GPT-OSS reasoning model, the grammar constraint is now scoped to the <|channel|>final region via xgrammar's TriggeredTagsFormat. Previously, grammar bitmasks were applied from token 0, clobbering reasoning output.
  • Handle json_object format: text.format.type=json_object was silently ignored in the Responses API. Now produces StructuredOutputsParams(json_object=True), matching chat completions behavior.
  • Fix streaming + json_schema alias bug: Remove .model_dump() in the streaming path that dropped the schemaschema_ Pydantic alias, causing ResponseCreatedEvent deserialization failures.
  • Apply reasoning channel tags unconditionally: When a reasoning parser is active but no structured output is requested, reasoning channel tags are still applied (the struct_out is None branch).

Approach

Rather than modifying StructuredOutputsParams to allow multiple simultaneous constraint types (which would require deep changes to validation, backends, and dispatch), we embed the content constraint inside the structural tag's <|channel|>final tag.

xgrammar's TagFormat.content field already accepts a discriminated union of JSONSchemaFormat, GrammarFormat, RegexFormat, etc. (defined in xgrammar/structural_tag.py). The infrastructure to "apply JSON schema grammar only within the <|channel|>final region" already exists — we just wire it up from the Responses API.

This means:

  • StructuredOutputsParams keeps its existing mutual-exclusivity invariant (one constraint type)
  • The constraint type used is structural_tag, which internally contains both reasoning channel enforcement AND the content constraint scoped to the final channel
  • xgrammar handles the compilation natively — no custom grammar composition needed
  • User-specified options (disable_any_whitespace, disable_fallback, etc.) are preserved via dataclasses.replace()

Decisions We Made That Can Be Debated

1. Embed constraint inside structural tag vs. allow multiple constraint types on StructuredOutputsParams

What we chose: When a reasoning parser is active and a content constraint (json_schema, regex, grammar, choice) is present, we convert the content constraint into an xgrammar content format dict, embed it in the <|channel|>final tag within the structural tag, then clear the original constraint fields. The final StructuredOutputsParams has only structural_tag set.

Alternative: Modify StructuredOutputsParams to support multiple simultaneous constraint types (e.g. structural_tag + json). This would avoid the mid-pipeline mutation pattern where we clear fields after embedding them, but requires changes to validation logic, backend dispatch in StructuredOutputManager, and every guided decoding backend's understanding of what "one constraint" means.

Why we chose this: xgrammar's TagFormat.content field already supports this composition natively — the infrastructure exists and is tested. The mutual-exclusivity invariant on StructuredOutputsParams is load-bearing across the entire structured output stack, and relaxing it has a large blast radius.

What reviewers might disagree with: The mid-pipeline mutation (clearing json/regex/etc. after embedding) means StructuredOutputsParams no longer reflects what the user originally requested. If downstream code inspects these fields (e.g., for logging, metrics, or error messages), it will see None instead of the original constraint. An alternative could be to construct a fresh StructuredOutputsParams(structural_tag=...) rather than mutating via dataclasses.replace().

2. Fix text.format path rather than redirecting users to structured_outputs field

What we chose: We fix the standard OpenAI text.format path so that json_schema, json_object, and streaming all work correctly. Users can use either text.format (OpenAI-compatible) or the vLLM-specific structured_outputs field (#33709).

Alternative: Only support structured output through the vLLM-specific structured_outputs field and treat text.format as a passthrough/echo-only field (the status quo before this PR, where json_object was silently ignored).

Context: This is an area of active debate. In #33709, @yeqcharlotte and @chaunceyjiang questioned why structured output wasn't going through text.format instead of a separate field. In #33381, @chaunceyjiang argued vLLM-specific extensions should go through the OpenResponses extension mechanism. Meanwhile, @alecsolder defended the separate field for cross-provider reusability and separation of concerns. In #19097, vllm_-prefixed types were proposed but the RFC was auto-closed without implementation.

Why we chose this: Users coming from the OpenAI SDK will naturally use text.format.type=json_schema — it should just work. The structured_outputs field is additive for vLLM-specific capabilities (grammar, regex, choice) that text.format can't express. Fixing both paths costs little and prevents user confusion.

3. Remove .model_dump() vs. add by_alias=True for streaming alias bug

What we chose: Remove the .model_dump() call in the streaming path and pass the ResponsesResponse Pydantic object directly to ResponseCreatedEvent, matching how ResponseCompletedEvent already works. This is the approach from #34611.

Alternative: Keep .model_dump() but add by_alias=True so Pydantic serializes schema_ as "schema". This is the approach from #26356, which has community confirmation of working.

Why we chose this: Removing the unnecessary dict round-trip eliminates the entire class of alias bugs rather than patching one instance. This is consistent with @qandrew's own #26185 which previously removed a .model_dump() call on the ResponseCompletedEvent path for the same category of issue. The by_alias=True approach is fragile — any future alias field would break again if someone forgets the flag.

4. Apply reasoning channel tags even when no structured output is requested

What we chose: When struct_out is None and a reasoning parser is active, we now create a StructuredOutputsParams(structural_tag=...) with just the reasoning channel tags. Previously, the prepare_structured_tag() block was only entered when struct_out was already a StructuredOutputsParams instance.

Alternative: Keep the existing behavior where reasoning channel tags are only applied when the user explicitly requests some form of structured output.

Why we chose this: Without structural tags, GPT-OSS models emit raw Harmony format (<|channel|>analysis<|message|>...) that the reasoning parser must post-hoc parse. With structural tags, xgrammar enforces the channel structure at decode time, which is more robust and enables future optimizations. This also means the reasoning parser's is_reasoning_end state machine (which has had multi-turn bugs per #34454) is supplemented by grammar-level enforcement.

What reviewers might disagree with: This changes default behavior for all GPT-OSS requests that don't request structured output. If a model produces valid output without structural tags but would be over-constrained with them, this could cause regressions. We don't have e2e validation of this path yet.

5. json_object mapped to {"type": "object"} in structural tag content

What we chose: In _constraint_to_content_format(), json_object=True is converted to {"type": "json_schema", "json_schema": {"type": "object"}} for embedding in the structural tag.

Alternative: Map it to a dedicated json_object content format type if xgrammar supports one, or skip embedding entirely and let the existing json_object handling in the structured output backend handle it outside the structural tag.

Why we chose this: xgrammar's TagFormat.content expects one of its known format types (json_schema, regex, grammar, etc.). {"type": "object"} is the minimal JSON schema that enforces "output must be a JSON object" — semantically equivalent to json_object mode. This ensures the constraint is properly scoped to the <|channel|>final region for reasoning models rather than being applied globally.

6. Adding final_content_format parameter to the base class prepare_structured_tag()

What we chose: We added final_content_format: dict | None = None as an optional parameter on ReasoningParser.prepare_structured_tag() in the base class, with a default of None that preserves backward compatibility.

Alternative: Only add the parameter on GPTOSSReasoningParser and handle the dispatch in serving.py with a type check or capability flag. Or create a separate method like prepare_structured_tag_with_constraint().

Why we chose this: The base class change is backward-compatible (default None, existing implementations don't need changes). The concept of "scope this content constraint to the model's final output region" is generic — it's not GPT-OSS-specific. Other reasoning models (Qwen3, DeepSeek-R1, future models) with structural tag support would benefit from the same interface. Keeping it on the base class establishes a clean contract.

What reviewers might disagree with: This couples content constraint format knowledge (xgrammar dict format) to the reasoning parser interface. If vLLM ever supports a non-xgrammar structured output backend, this dict format may not apply. A more abstract interface (e.g., passing StructuredOutputsParams directly) might be more future-proof.

Related Issues, PRs, and RFCs

Directly Addressed by This PR

# Title Status How This PR Relates
#34857 Responses API & Tool Calling H1 2026 roadmap Open Explicitly lists "guided decode and structured outputs" as focus area. This PR delivers that.
#23120 Structured output not correctly enforced with GPT-OSS Open Root cause: grammar bitmasks applied from token 0 without structural tag channel separation. This PR fixes the Responses API path.
#26288 schema field becomes None in streaming with json_schema Closed Root-cause analysis of the schema_/schema alias bug in streaming. The .model_dump() removal in this PR fixes it.
#34611 Fix ResponseCreatedEvent ValidationError for json_schema in streaming Open Proposes removing .model_dump() in streaming path. We adopt this approach.
#26356 Fix json schema alias serializing when streaming Open Alternative fix (add by_alias=True). We prefer #34611's approach (pass objects directly).
#26822 Fix crash when text type response_format received Merged Added validation for type: "text" passthrough. Our json_object handling follows the same pattern.
#26639 ValueError: No valid structured output parameter found Closed (by #26822) The json_object gap in the Responses API could produce similar errors. Our Step 1 prevents this.

Foundation This PR Builds On

# Title Status Relevance
#33709 Enable generic structured_outputs for responses API Merged Added the structured_outputs field to ResponsesRequest. Our work builds on this.
#32609 Add sampling parameters to Responses API Merged Established to_sampling_params() infrastructure on ResponsesRequest.
#32712 Initial Parser for Responses API Merged Introduced Parser/ParserManager and the structural tag preparation block we're extending.
#34454 Fix structured output in multi-turn GPT-OSS Merged Fixed premature grammar bitmask activation from previous-turn markers. Our structural tag approach inherently avoids this class of bug by constraining grammar to the `<
#32791 chat.completions returns null for GPT-OSS multi-turn with json_object Closed (by #34454) Same root cause as #23120. Our approach prevents this by design.

Related RFCs

# Title Status Design Decision
#19097 RFC: Response format extensions for structured outputs Closed Led to the structured_outputs field. We reuse StructuredOutputsParams rather than creating new types.
#33381 RFC: Align with openresponses spec Open Argues vLLM-specific extensions should go through extension mechanism. Decision: We keep the existing structured_outputs field (already merged in #33709) and also fix the standard text.format path. No new protocol extensions.
#29632 RFC: Force EOS when grammar terminates Open When grammar is satisfied, model may not produce EOS immediately. Out of scope for this PR but noted as a follow-up.
#16313 Support structured output + tool call together Open Tool calls + JSON schema in one request. Our structural tag approach naturally supports this since tool channels and the final channel are independent tags.
#33249 Add structured_outputs as instance field on ResponsesRequest Open Promotes structured_outputs from local var to field for tool parser mutation. Compatible with our changes; we support both the field path and the text.format path.

Changes

File Change
vllm/entrypoints/openai/responses/protocol.py Add json_object handling in to_sampling_params()
vllm/entrypoints/openai/responses/serving.py Add _constraint_to_content_format() helper; rewire structural tag preparation block; fix streaming .model_dump()
vllm/reasoning/abs_reasoning_parsers.py Add final_content_format param to prepare_structured_tag() base class
vllm/reasoning/gptoss_reasoning_parser.py Implement final_content_format — append `<
tests/entrypoints/openai/responses/test_structured_output.py New — unit tests for _constraint_to_content_format
tests/v1/structured_output/test_gptoss_structural_tags.py Extend with constraint embedding tests
tests/entrypoints/openai/responses/test_sampling_params.py Extend with json_object test
.gitignore Add local_test/

Test plan

  • Unit tests pass: pytest tests/entrypoints/openai/responses/test_structured_output.py tests/v1/structured_output/test_gptoss_structural_tags.py tests/entrypoints/openai/responses/test_sampling_params.py -v
  • No regressions in responses unit tests
  • Pre-commit passes on all changed files
  • E2e with GPT-OSS model: verify json_schema + reasoning produces valid JSON with reasoning properly separated
  • E2e with Qwen3: verify json_schema, json_object, and streaming all work (non-regression)

Out of Scope (follow-ups)

  • Chat Completions has the same gap (#23120) — serving_chat.py never calls prepare_structured_tag(). Same fix pattern applies but is a separate PR targeting the chat completions path.
  • strict field forwarding from ResponseFormatTextJSONSchemaConfig — low priority, vLLM always enforces strictly.
  • Force EOS when grammar terminates (#29632) — separate design discussion affecting all APIs.
  • OpenResponses alignment (#33381) — policy decision about whether structured_outputs should go through extension mechanism.
  • structured_outputs as instance field (#33249) — promotes structured_outputs from local variable to field for tool parser mutation. Compatible with our changes but independent concern.

…dding

When a user requests JSON schema enforcement (text.format.type=json_schema)
with a reasoning model (GPT-OSS), the grammar constraint was never scoped
to the final output channel. This caused grammar bitmasks to be applied
from token 0, clobbering reasoning output.

Fix by embedding content constraints (json_schema, json_object, regex,
grammar, choice) inside the structural tag's <|channel|>final region
using xgrammar's native TriggeredTagsFormat support. This ensures grammar
enforcement only applies within the final output region, not during
reasoning.

Also:
- Handle text.format.type=json_object (was silently ignored)
- Fix streaming + json_schema alias bug (.model_dump() dropped schema alias)
- Apply reasoning channel tags even when no structured output is requested
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances structured output capabilities, especially for reasoning models, by embedding content constraints within structural tags. It also adds support for json_object format and fixes a streaming bug. My review found one high-severity issue where options for structured output generation were being dropped when combining with reasoning. I've suggested a fix to preserve these options.

garrio-1 and others added 2 commits March 3, 2026 10:20
…nstraints

When creating a new StructuredOutputsParams with the structural_tag,
use dataclasses.replace() to clear content constraint fields while
preserving user-specified options like disable_any_whitespace,
disable_fallback, disable_additional_properties, and whitespace_pattern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants