-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Feature Request: Support OpenAI Responses API (/v1/responses) in llama.cpp server #19138
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Description:
The llama.cpp OpenAI-compatible server currently supports endpoints like /v1/chat/completions, but does not support the newer OpenAI Responses API (/v1/responses).
Many OpenAI client SDKs and tools are moving toward the unified Responses API, which supports structured outputs, tool calls, and multimodal responses in a single endpoint. Lack of /v1/responses support makes it harder to use llama.cpp as a drop-in replacement for OpenAI backends without an additional proxy or translation layer.
Requested feature:
Add native support for the /v1/responses endpoint in llama-server, aligned as closely as possible with OpenAI’s Responses API request/response format.
Benefits:
Improved compatibility with modern OpenAI SDKs
Easier migration from OpenAI APIs to llama.cpp
Reduced need for custom proxy or request rewriting layers
Thanks for the great work on llama.cpp!
Motivation
I want to use ClaudeCode with llama-server offline, but it currently don't support necessary endpoints.
Possible Implementation
No response