model : gpt-oss add response_format support by aldehir · Pull Request #15494 · ggml-org/llama.cpp

aldehir · 2025-08-22T02:05:02Z

Add response_format support to gpt-oss models.

The generic grammar implementation is not great for gpt-oss,

curl example

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "response_format": {
      "type": "json_object",
      "schema": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "title": "City Details",
        "description": "A simple object containing key details about a city.",
        "type": "object",
        "properties": {
          "country": {
            "description": "The country where the city is located.",
            "type": "string"
          },
          "landmarks": {
            "description": "A list of notable landmarks in the city.",
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        },
        "required": ["country","landmarks"]
      }
    },
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are a helpful assistant designed to output JSON. For a given city, provide its country, and a list of three notable landmarks."
          },
          {
            "type": "text",
            "text": "# Response Formats\n## city\n### {\"type\": \"object\", \"properties\": { \"country\": { \"description\": \"The country where the city is located.\", \"type\": \"string\" }, \"landmarks\": { \"description\": \"A list of notable landmarks in the city.\", \"type\": \"array\", \"items\": { \"type\": \"string\" } } }, \"required\": [\"country\",\"landmarks\"] }"
          }
        ]
      },
      {
        "role": "user",
        "content": "Zürich"
      }
    ]
  }'

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\n\n  \"country\": \"Switzerland\"\n\n  , \"landmarks\":[\n\n\"[\"]\n\n  }"
      }
    }
  ],
  ...

Note the weirdness around landmarks.

This PR wraps the response_format schema in a harmony-aware grammar so the model can answer properly,

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "The user gave the city \"Zürich\". We need to output JSON in the defined schema. The schema says: object with properties: \"country\" (string) and \"landmarks\" (array of strings). It's required at least those two. We must supply. Provide country: Switzerland. Landmarks: choose 3 notable landmarks: \"Bahnhofstrasse\", \"Lake Zürich (Limmat, scenic)\", \"Zürcher Mozartplatz and its cathedral\"? Let's find known landmarks: \"Château Fraiture\"? Wait landmarks: \"Lake Zürich\", \"Bahnhofstrasse\", \"Old Town\" (Altstadt), \"Kunsthaus Zürich\". Choose 3: \"Bahnhofstrasse\", \"Lake Zürich\", \"Kunsthaus Zürich\". Compose JSON. Ensure it's valid according to schema. Should be:\n\n{\n \"country\": \"Switzerland\",\n \"landmarks\": [\n   \"Bahnhofstrasse\",\n   \"Lake Zürich\",\n   \"Kunsthaus Zürich\"\n ]\n}\n\nMake sure no extra keys. Provide only JSON.",
        "content": "{\"country\":\"Switzerland\",\"landmarks\":[\"Bahnhofstrasse\",\"Lake Zürich\",\"Kunsthaus Zürich\"]}"
      }
    }
  ],
  ...

fixes #15276

common/chat.cpp

samshipengs · 2025-08-24T23:33:52Z

@aldehir thanks for the change, and it seems like now my chat completion request with response_format is working with the llama.cpp backend.

one question, would the grammar rule also affect the reasoning token generation of gpt-oss? i.e. forcing the reasoning tokens to be generated in the json schema format, which certainly would impact the performance.

aldehir · 2025-08-24T23:55:53Z

@samshipengs with reasoning-format == auto, any reasoning done by the the model will present itself in the reasoning_content field. The content field should not contain reasoning traces. Does that answer your question?

samshipengs · 2025-08-25T01:20:21Z

@aldehir i haven't looked at the reasoning-format == auto, im currently only setting the resoning level e.g. "low", will take a look. I'm relatively new to using llama and even structured output, so im not sure what i was asking makes sense, basically i was concerned that, a grammar (grammar based constraint sampling?) is not only active in final output generation but every reasoning step as well, where we don't wanna constrain reasoning token, if you are saying the reasoning tokens are still generated as they are (harmony format?), and only the output gets constrainted by the grammar sampling then that sounds good, then it perhaps really is the case that gpt-oss20b is not performing well on my task (benchmarking against existing model being used e.g. gpt4.1-mini)

aldehir · 2025-08-25T03:12:00Z

@samshipengs Ah, ok. The grammar for gpt-oss when using response_format does not constrain reasoning. It gives the model the flexibility to reason and only constrains the final message. You can verify by seeing if reasoning_content exists and is populated in the response.

If you're finding reasoning traces in your structured output, I would verify you are passing in --jinja. Otherwise it may be as you say, the model does not perform well for your task.

samshipengs · 2025-08-25T19:55:46Z

@aldehir I was using --jinja, i now turned it off, my task is simply classification given a large text body.

I noticed that if i don't use structured_output i.e. response_format not passing in, it seems to give me more sensible answer (im looking at the final channel of the harmony format response) comapred to the parsed from passing in a pydantic model in response_format.

Is the grammar based constraint decoding in llama cpp done by GBNF? Do we know if openai (for their commercial models) uses the same constraint decoding technique?

aldehir · 2025-08-25T21:09:45Z

@samshipengs the grammar is defined in gbnf, but I don't know the specifics about the constrained decoding implementation.

If you can provide an example of such a task, I can look further into it.

This reverts commit 32732f2.

model : gpt-oss add response_format support

320536e

aldehir commented Aug 22, 2025

View reviewed changes

common/chat.cpp Show resolved Hide resolved

ggerganov approved these changes Aug 22, 2025

View reviewed changes

ngxson approved these changes Aug 22, 2025

View reviewed changes

aldehir merged commit 32732f2 into ggml-org:master Aug 22, 2025
48 checks passed

firecoperana mentioned this pull request Aug 23, 2025

Tool calls support from mainline ikawrakow/ik_llama.cpp#723

Merged

4 tasks

qnixsynapse pushed a commit to janhq/llama.cpp that referenced this pull request Aug 25, 2025

model : gpt-oss add response_format support (ggml-org#15494)

95ad6b1

aldehir mentioned this pull request Aug 30, 2025

feat: nemotron thinking & toolcalling support #15676

Merged

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 6, 2025

Revert "model : gpt-oss add response_format support (ggml-org#15494)"

c90a17d

This reverts commit 32732f2.

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

model : gpt-oss add response_format support (#15494)

66e97cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model : gpt-oss add response_format support#15494

model : gpt-oss add response_format support#15494
aldehir merged 1 commit intoggml-org:masterfrom
aldehir:harmony-structured-output

aldehir commented Aug 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

samshipengs commented Aug 24, 2025

Uh oh!

aldehir commented Aug 24, 2025

Uh oh!

samshipengs commented Aug 25, 2025 •

edited

Loading

Uh oh!

aldehir commented Aug 25, 2025 •

edited

Loading

Uh oh!

samshipengs commented Aug 25, 2025

Uh oh!

aldehir commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aldehir commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samshipengs commented Aug 24, 2025

Uh oh!

aldehir commented Aug 24, 2025

Uh oh!

samshipengs commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samshipengs commented Aug 25, 2025

Uh oh!

aldehir commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aldehir commented Aug 22, 2025 •

edited

Loading

samshipengs commented Aug 25, 2025 •

edited

Loading

aldehir commented Aug 25, 2025 •

edited

Loading