Skip to content

Feature Request: Support OpenAI Responses API (/v1/responses) in llama.cpp server #19138

@AF-2000

Description

@AF-2000

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Description:
The llama.cpp OpenAI-compatible server currently supports endpoints like /v1/chat/completions, but does not support the newer OpenAI Responses API (/v1/responses).

Many OpenAI client SDKs and tools are moving toward the unified Responses API, which supports structured outputs, tool calls, and multimodal responses in a single endpoint. Lack of /v1/responses support makes it harder to use llama.cpp as a drop-in replacement for OpenAI backends without an additional proxy or translation layer.

Requested feature:
Add native support for the /v1/responses endpoint in llama-server, aligned as closely as possible with OpenAI’s Responses API request/response format.

Benefits:

Improved compatibility with modern OpenAI SDKs

Easier migration from OpenAI APIs to llama.cpp

Reduced need for custom proxy or request rewriting layers

Thanks for the great work on llama.cpp!

Motivation

I want to use ClaudeCode with llama-server offline, but it currently don't support necessary endpoints.

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions