Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
245 changes: 140 additions & 105 deletions docs/docs/providers/openai_responses_limitations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,52 +5,34 @@ sidebar_label: Limitations of Responses API
sidebar_position: 1
---

## Unresolved Issues
## Issues

This document outlines known limitations and inconsistencies between Llama Stack's Responses API and OpenAI's Responses API. This comparison is based on OpenAI's API and reflects a comparison with the OpenAI APIs as of October 6, 2025 (OpenAI's client version `openai==1.107`).
See the OpenAI [changelog](https://platform.openai.com/docs/changelog) for details of any new functionality that has been added since that date. Links to issues are included so readers can read about status, post comments, and/or subscribe for updates relating to any limitations that are of specific interest to them. We would also love any other feedback on any use-cases you try that do not work to help prioritize the pieces left to implement.
Please open new issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work that does not already have an open issue.

### Instructions
**Status:** Partial Implementation + Work in Progress

**Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566)

In Llama Stack, the instructions parameter is already implemented for creating a response, but it is not yet included in the output response object.

---

### Streaming
### Web-search tool compatibility

**Status:** Partial Implementation

**Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364)

Streaming functionality for the Responses API is partially implemented and does work to some extent, but some streaming response objects that would be needed for full compatibility are still missing.

---

### Prompt Templates
Both OpenAI and Llama Stack support a web-search built-in tool. The [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create) for web search tool in a Responses tool list says:

**Status:** Partial Implementation
> The type of the web search tool. One of `web_search` or `web_search_2025_08_26`.

**Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321)
Llama Stack now supports both `web_search` and `web_search_2025_08_26` types, matching OpenAI's API. For backward compatibility, Llama Stack also supports `web_search_preview` and `web_search_preview_2025_03_11` types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems confusing now. I think it would make sense to split this into the parts that are still open (filters and user location) that stay here, and the parts that are now done (the aliases) in the Resolved Issues section, which I think requires some rewriting of both.


OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing. This feature is under development for Llama Stack.
The OpenAI web search tool also has fields for `filters` and `user_location` which are not yet implemented in Llama Stack. If feasible, it would be good to support these too.

---

### Web-search tool compatibility

**Status:** Partial Implementation
### Reasoning Content

Both OpenAI and Llama Stack support a web-search built-in tool. The [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create) for web search tool in a Responses tool list says:

> The type of the web search tool. One of `web_search` or `web_search_2025_08_26`.
**Status:** Not Implemented

Llama Stack now supports both `web_search` and `web_search_2025_08_26` types, matching OpenAI's API. For backward compatibility, Llama Stack also supports `web_search_preview` and `web_search_preview_2025_03_11` types.
**Issue:** [#4404](https://github.com/llamastack/llama-stack/issues/4404)

The OpenAI web search tool also has fields for `filters` and `user_location` which are not yet implemented in Llama Stack. If feasible, it would be good to support these too.
Responses API allows you to preserve reasoning context between turns with the reasoning.encrypted_content include value.
The field exists as a no-op right now, and needs to be wired up to providers.

---

Expand All @@ -77,53 +59,13 @@ Response branching, as discussed in the [Agents vs OpenAI Responses API document

---

### Include

**Status:** Not Implemented

The `include` parameter allows you to provide a list of values that indicate additional information for the system to include in the model response. The [OpenAI API](https://platform.openai.com/docs/api-reference/responses/create) specifies the following allowed values for this parameter.

- `web_search_call.action.sources`
- `code_interpreter_call.outputs`
- `computer_call_output.output.image_url`
- `file_search_call.results`
- `message.input_image.image_url`
- `message.output_text.logprobs`
- `reasoning.encrypted_content`

Some of these are not relevant to Llama Stack in its current form. For example, code interpreter is not implemented (see "Built-in tools" below), so `code_interpreter_call.outputs` would not be a useful directive to Llama Stack.

However, others might be useful. For example, `message.output_text.logprobs` can be useful for assessing how confident a model is in each token of its output.

---

### Tool Choice

**Status:** Not Implemented

**Issue:** [#3548](https://github.com/llamastack/llama-stack/issues/3548)

In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or requirements for which tools should be used when generating a response. This feature is not implemented in Llama Stack.

---

### Safety Identification and Tracking

**Status:** Not Implemented

OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack.

---
**Issue:** [#4381](https://github.com/llamastack/llama-stack/issues/4381)

### Connectors

**Status:** Not Implemented

Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp).

**Open Questions:**
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
- Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API?
OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack.

---

Expand All @@ -145,27 +87,6 @@ Responses has a field `service_tier` that can be used to prioritize access to in

---

### Top Logprobs

**Status:** Not Implemented

**Issue:** [#3552](https://github.com/llamastack/llama-stack/issues/3552)

The `top_logprobs` parameter from OpenAI's Responses API extends the functionality obtained by including `message.output_text.logprobs` in the `include` parameter list (as discussed in the Include section above).
It enables users to also get logprobs for alternative tokens.

---

### Max Tool Calls

**Status:** Not Implemented

**Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563)

The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response. This feature needs full implementation and documentation.

---

### Max Output Tokens

**Status:** Not Implemented
Expand All @@ -186,16 +107,6 @@ The return object from a call to Responses includes a field for indicating why a

---

### Metadata

**Status:** Not Implemented

**Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564)

Metadata allows you to attach additional information to a response for your own reference and tracking. It is not implemented in Llama Stack.

---

### Background

**Status:** Not Implemented
Expand Down Expand Up @@ -249,6 +160,8 @@ Sampling allows MCP tools to query the generative AI model. See the [MCP specifi
- If not, is there a reasonable way to make that work within the API as is? Or would the API need to change?
- Does this work in Llama Stack?

---

### Prompt Caching

**Status:** Unknown
Expand All @@ -262,15 +175,106 @@ OpenAI provides a [prompt caching](https://platform.openai.com/docs/guides/promp

---

## Coming Soon

### Parallel Tool Calls

**Status:** Rumored Issue
**Status:** In Progress

Align Llama Stack Responses Paralell tool calls behavior with OpenAI and harden the implementation with tests.

---

### Connectors

**Status:** In Progress

**Issue:** [#4061](https://github.com/llamastack/llama-stack/issues/4061)

Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp).

**Open Questions:**
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
- Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API?

---

### Top Logprobs

**Status:** In Progress

**Issue:** [#3552](https://github.com/llamastack/llama-stack/issues/3552)

The `top_logprobs` parameter from OpenAI's Responses API extends the functionality obtained by including `message.output_text.logprobs` in the `include` parameter list (as discussed in the Include section above).
It enables users to also get logprobs for alternative tokens.

---

### Server Side Telemetry

**Status:** Merged [Planned 0.4.z]

**Issue:** [#3806](https://github.com/llamastack/llama-stack/issues/3806)

Support OpenTelemetry as the preferred way to instrument Llama Stack.

**Remaining Issues:**
- Some data needs to be converted to follow semantic conventions for OTEL genai data

---

### Include

**Status:** Merged [Planned 0.4.z]

The `include` parameter allows you to provide a list of values that indicate additional information for the system to include in the model response.
The [OpenAI API](https://platform.openai.com/docs/api-reference/responses/create) specifies the following allowed values for this parameter.

- `web_search_call.action.sources`
- `code_interpreter_call.outputs`
- `computer_call_output.output.image_url`
- `file_search_call.results`
- `message.input_image.image_url`
- `message.output_text.logprobs`
- `reasoning.encrypted_content`

This change adds all the following parameters to inputs and output objects for responses and chat completions in Llama Stack for full API compatibility.
It also implements `message.output_text.logprobs`, allowing users to get logprobs outputs from their inferencing requests. *Note* that the other fields are
just stubs for now, and will not work when invoked since Llama Stack does not yet support built in tools.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Emilio!

Just wondering whether it would make sense to set the status of this section to "partial implementation" or something along those lines. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with @s-akhtar-baig and in particular would like to see something about the parts that are still missing listed in the open issues section and a note here saying that message.output_text.logprobs is now working even though the other values for this parameter are not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me instead make it clear that those values are now accepted as API fields, but are not implemented. I can then link it back to the issue that is still pending.


---

### Max Tool Calls

There are reports that `parallel_tool_calls` may not work correctly. This needs verification and a ticket should be opened if confirmed.
**Status:** Merged [Planned 0.4.z]

**Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563)

The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response.

---

## Resolved Issues
### Metadata

**Status:** Merged [Planned 0.4.z]

**Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564)

Metadata allows you to attach additional information to a response for your own reference and tracking.

---

### Tool Choice

**Status:** Merged [Planned 0.4.z]

**Issue:** [#3548](https://github.com/llamastack/llama-stack/issues/3548)

In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or requirements for which tools should be used when generating a response.

---

## Fixed

The following limitations have been addressed in recent releases:

Expand All @@ -297,3 +301,34 @@ The `require_approval` parameter for MCP tools in the Responses API now works co
**Fixed in:** [#3003](https://github.com/llamastack/llama-stack/pull/3003) (Agent API), [#3602](https://github.com/llamastack/llama-stack/pull/3602) (Responses API)

MCP tools now correctly handle array-type arguments in both the Agent API and Responses API.

---

### Streaming

**Status:** ✅ Resolved

**Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364)

Streaming functionality for the Responses API is feature complete and released.

---

### Prompt Templates

**Status:** ✅ Resolved

**Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321)

OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing.

---

### Instructions
**Status:** ✅ Resolved

**Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566)

The Responses API request and response object now supports the *instructions* field.

---