Skip to content

Commit 3a9d560

Browse files
committed
fix: update responses limitations doc to track latest state
1 parent 10c878d commit 3a9d560

1 file changed

Lines changed: 90 additions & 64 deletions

File tree

docs/docs/providers/openai_responses_limitations.mdx

Lines changed: 90 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -5,41 +5,12 @@ sidebar_label: Limitations of Responses API
55
sidebar_position: 1
66
---
77

8-
## Unresolved Issues
8+
## Issues
99

1010
This document outlines known limitations and inconsistencies between Llama Stack's Responses API and OpenAI's Responses API. This comparison is based on OpenAI's API and reflects a comparison with the OpenAI APIs as of October 6, 2025 (OpenAI's client version `openai==1.107`).
1111
See the OpenAI [changelog](https://platform.openai.com/docs/changelog) for details of any new functionality that has been added since that date. Links to issues are included so readers can read about status, post comments, and/or subscribe for updates relating to any limitations that are of specific interest to them. We would also love any other feedback on any use-cases you try that do not work to help prioritize the pieces left to implement.
1212
Please open new issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work that does not already have an open issue.
1313

14-
### Instructions
15-
**Status:** Partial Implementation + Work in Progress
16-
17-
**Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566)
18-
19-
In Llama Stack, the instructions parameter is already implemented for creating a response, but it is not yet included in the output response object.
20-
21-
---
22-
23-
### Streaming
24-
25-
**Status:** Partial Implementation
26-
27-
**Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364)
28-
29-
Streaming functionality for the Responses API is partially implemented and does work to some extent, but some streaming response objects that would be needed for full compatibility are still missing.
30-
31-
---
32-
33-
### Prompt Templates
34-
35-
**Status:** Partial Implementation
36-
37-
**Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321)
38-
39-
OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing. This feature is under development for Llama Stack.
40-
41-
---
42-
4314
### Web-search tool compatibility
4415

4516
**Status:** Partial Implementation
@@ -111,19 +82,9 @@ In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or r
11182

11283
**Status:** Not Implemented
11384

114-
OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack.
115-
116-
---
117-
118-
### Connectors
119-
120-
**Status:** Not Implemented
121-
122-
Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp).
85+
**Issue:** [#4381](https://github.com/llamastack/llama-stack/issues/4381)
12386

124-
**Open Questions:**
125-
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
126-
- Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API?
87+
OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack.
12788

12889
---
12990

@@ -156,16 +117,6 @@ It enables users to also get logprobs for alternative tokens.
156117

157118
---
158119

159-
### Max Tool Calls
160-
161-
**Status:** Not Implemented
162-
163-
**Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563)
164-
165-
The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response. This feature needs full implementation and documentation.
166-
167-
---
168-
169120
### Max Output Tokens
170121

171122
**Status:** Not Implemented
@@ -186,16 +137,6 @@ The return object from a call to Responses includes a field for indicating why a
186137

187138
---
188139

189-
### Metadata
190-
191-
**Status:** Not Implemented
192-
193-
**Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564)
194-
195-
Metadata allows you to attach additional information to a response for your own reference and tracking. It is not implemented in Llama Stack.
196-
197-
---
198-
199140
### Background
200141

201142
**Status:** Not Implemented
@@ -249,6 +190,8 @@ Sampling allows MCP tools to query the generative AI model. See the [MCP specifi
249190
- If not, is there a reasonable way to make that work within the API as is? Or would the API need to change?
250191
- Does this work in Llama Stack?
251192

193+
---
194+
252195
### Prompt Caching
253196

254197
**Status:** Unknown
@@ -262,14 +205,66 @@ OpenAI provides a [prompt caching](https://platform.openai.com/docs/guides/promp
262205

263206
---
264207

208+
## Coming Soon
209+
210+
---
211+
265212
### Parallel Tool Calls
266213

267-
**Status:** Rumored Issue
214+
**Status:** In Progress
215+
216+
Align Llama Stack Responses Paralell tool calls behavior with OpenAI and harden the implementation with tests.
268217

269-
There are reports that `parallel_tool_calls` may not work correctly. This needs verification and a ticket should be opened if confirmed.
218+
---
219+
220+
### Connectors
221+
222+
**Status:** In Progress
223+
224+
**Issue:** [#4061](https://github.com/llamastack/llama-stack/issues/4061)
225+
226+
Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp).
227+
228+
**Open Questions:**
229+
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
230+
- Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API?
231+
232+
---
233+
234+
### Server Side Telemetry
235+
236+
**Status:** Merged [Planned 0.4.z]
237+
238+
**Issue:** [#3806](https://github.com/llamastack/llama-stack/issues/3806)
239+
240+
Support OpenTelemetry as the preferred way to instrument Llama Stack.
241+
242+
**Remaining Issues:**
243+
- Some data needs to be converted to follow semantic conventions for OTEL genai data
244+
245+
---
246+
247+
### Max Tool Calls
248+
249+
**Status:** Merged [Planned 0.4.z]
250+
251+
**Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563)
252+
253+
The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response.
254+
255+
---
256+
257+
### Metadata
258+
259+
**Status:** Merged [Planned 0.4.z]
260+
261+
**Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564)
262+
263+
Metadata allows you to attach additional information to a response for your own reference and tracking.
270264

271265
---
272266

267+
273268
## Resolved Issues
274269

275270
The following limitations have been addressed in recent releases:
@@ -297,3 +292,34 @@ The `require_approval` parameter for MCP tools in the Responses API now works co
297292
**Fixed in:** [#3003](https://github.com/llamastack/llama-stack/pull/3003) (Agent API), [#3602](https://github.com/llamastack/llama-stack/pull/3602) (Responses API)
298293

299294
MCP tools now correctly handle array-type arguments in both the Agent API and Responses API.
295+
296+
---
297+
298+
### Streaming
299+
300+
**Status:** ✅ Resolved
301+
302+
**Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364)
303+
304+
Streaming functionality for the Responses API is feature complete and released.
305+
306+
---
307+
308+
### Prompt Templates
309+
310+
**Status:** ✅ Resolved
311+
312+
**Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321)
313+
314+
OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing.
315+
316+
---
317+
318+
### Instructions
319+
**Status:** ✅ Resolved
320+
321+
**Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566)
322+
323+
The Responses API request and response object now supports the *instructions* field.
324+
325+
---

0 commit comments

Comments
 (0)