-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix: update responses limitations doc to track latest state #4392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
25e7e10
5e20cb4
21ef0fb
cf210fc
64b6ace
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,52 +5,34 @@ sidebar_label: Limitations of Responses API | |
| sidebar_position: 1 | ||
| --- | ||
|
|
||
| ## Unresolved Issues | ||
| ## Issues | ||
|
|
||
| This document outlines known limitations and inconsistencies between Llama Stack's Responses API and OpenAI's Responses API. This comparison is based on OpenAI's API and reflects a comparison with the OpenAI APIs as of October 6, 2025 (OpenAI's client version `openai==1.107`). | ||
| See the OpenAI [changelog](https://platform.openai.com/docs/changelog) for details of any new functionality that has been added since that date. Links to issues are included so readers can read about status, post comments, and/or subscribe for updates relating to any limitations that are of specific interest to them. We would also love any other feedback on any use-cases you try that do not work to help prioritize the pieces left to implement. | ||
| Please open new issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work that does not already have an open issue. | ||
|
|
||
| ### Instructions | ||
| **Status:** Partial Implementation + Work in Progress | ||
|
|
||
| **Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566) | ||
|
|
||
| In Llama Stack, the instructions parameter is already implemented for creating a response, but it is not yet included in the output response object. | ||
|
|
||
| --- | ||
|
|
||
| ### Streaming | ||
| ### Web-search tool compatibility | ||
|
|
||
| **Status:** Partial Implementation | ||
|
|
||
| **Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364) | ||
|
|
||
| Streaming functionality for the Responses API is partially implemented and does work to some extent, but some streaming response objects that would be needed for full compatibility are still missing. | ||
|
|
||
| --- | ||
|
|
||
| ### Prompt Templates | ||
| Both OpenAI and Llama Stack support a web-search built-in tool. The [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create) for web search tool in a Responses tool list says: | ||
|
|
||
| **Status:** Partial Implementation | ||
| > The type of the web search tool. One of `web_search` or `web_search_2025_08_26`. | ||
|
|
||
| **Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321) | ||
| Llama Stack now supports both `web_search` and `web_search_2025_08_26` types, matching OpenAI's API. For backward compatibility, Llama Stack also supports `web_search_preview` and `web_search_preview_2025_03_11` types. | ||
|
|
||
| OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing. This feature is under development for Llama Stack. | ||
| The OpenAI web search tool also has fields for `filters` and `user_location` which are not yet implemented in Llama Stack. If feasible, it would be good to support these too. | ||
|
|
||
| --- | ||
|
|
||
| ### Web-search tool compatibility | ||
|
|
||
| **Status:** Partial Implementation | ||
| ### Reasoning Content | ||
|
|
||
| Both OpenAI and Llama Stack support a web-search built-in tool. The [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create) for web search tool in a Responses tool list says: | ||
|
|
||
| > The type of the web search tool. One of `web_search` or `web_search_2025_08_26`. | ||
| **Status:** Not Implemented | ||
|
|
||
| Llama Stack now supports both `web_search` and `web_search_2025_08_26` types, matching OpenAI's API. For backward compatibility, Llama Stack also supports `web_search_preview` and `web_search_preview_2025_03_11` types. | ||
| **Issue:** [#4404](https://github.com/llamastack/llama-stack/issues/4404) | ||
|
|
||
| The OpenAI web search tool also has fields for `filters` and `user_location` which are not yet implemented in Llama Stack. If feasible, it would be good to support these too. | ||
| Responses API allows you to preserve reasoning context between turns with the reasoning.encrypted_content include value. | ||
| The field exists as a no-op right now, and needs to be wired up to providers. | ||
|
|
||
| --- | ||
|
|
||
|
|
@@ -77,53 +59,13 @@ Response branching, as discussed in the [Agents vs OpenAI Responses API document | |
|
|
||
| --- | ||
|
|
||
| ### Include | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
||
| The `include` parameter allows you to provide a list of values that indicate additional information for the system to include in the model response. The [OpenAI API](https://platform.openai.com/docs/api-reference/responses/create) specifies the following allowed values for this parameter. | ||
|
|
||
| - `web_search_call.action.sources` | ||
| - `code_interpreter_call.outputs` | ||
| - `computer_call_output.output.image_url` | ||
| - `file_search_call.results` | ||
| - `message.input_image.image_url` | ||
| - `message.output_text.logprobs` | ||
| - `reasoning.encrypted_content` | ||
|
|
||
| Some of these are not relevant to Llama Stack in its current form. For example, code interpreter is not implemented (see "Built-in tools" below), so `code_interpreter_call.outputs` would not be a useful directive to Llama Stack. | ||
|
|
||
| However, others might be useful. For example, `message.output_text.logprobs` can be useful for assessing how confident a model is in each token of its output. | ||
|
|
||
| --- | ||
|
|
||
| ### Tool Choice | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
||
| **Issue:** [#3548](https://github.com/llamastack/llama-stack/issues/3548) | ||
|
|
||
| In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or requirements for which tools should be used when generating a response. This feature is not implemented in Llama Stack. | ||
|
|
||
| --- | ||
|
|
||
| ### Safety Identification and Tracking | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
||
| OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack. | ||
|
|
||
| --- | ||
| **Issue:** [#4381](https://github.com/llamastack/llama-stack/issues/4381) | ||
|
|
||
| ### Connectors | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
||
| Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp). | ||
|
|
||
| **Open Questions:** | ||
| - Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors? | ||
| - Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API? | ||
| OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack. | ||
|
|
||
| --- | ||
|
|
||
|
|
@@ -145,27 +87,6 @@ Responses has a field `service_tier` that can be used to prioritize access to in | |
|
|
||
| --- | ||
|
|
||
| ### Top Logprobs | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
||
| **Issue:** [#3552](https://github.com/llamastack/llama-stack/issues/3552) | ||
|
|
||
| The `top_logprobs` parameter from OpenAI's Responses API extends the functionality obtained by including `message.output_text.logprobs` in the `include` parameter list (as discussed in the Include section above). | ||
| It enables users to also get logprobs for alternative tokens. | ||
|
|
||
| --- | ||
|
|
||
| ### Max Tool Calls | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
||
| **Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563) | ||
|
|
||
| The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response. This feature needs full implementation and documentation. | ||
|
|
||
| --- | ||
|
|
||
| ### Max Output Tokens | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
@@ -186,16 +107,6 @@ The return object from a call to Responses includes a field for indicating why a | |
|
|
||
| --- | ||
|
|
||
| ### Metadata | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
||
| **Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564) | ||
|
|
||
| Metadata allows you to attach additional information to a response for your own reference and tracking. It is not implemented in Llama Stack. | ||
|
|
||
| --- | ||
|
|
||
| ### Background | ||
|
|
||
| **Status:** Not Implemented | ||
|
|
@@ -249,6 +160,8 @@ Sampling allows MCP tools to query the generative AI model. See the [MCP specifi | |
| - If not, is there a reasonable way to make that work within the API as is? Or would the API need to change? | ||
| - Does this work in Llama Stack? | ||
|
|
||
| --- | ||
|
|
||
| ### Prompt Caching | ||
|
|
||
| **Status:** Unknown | ||
|
|
@@ -262,15 +175,106 @@ OpenAI provides a [prompt caching](https://platform.openai.com/docs/guides/promp | |
|
|
||
| --- | ||
|
|
||
| ## Coming Soon | ||
|
|
||
| ### Parallel Tool Calls | ||
|
|
||
| **Status:** Rumored Issue | ||
| **Status:** In Progress | ||
|
|
||
| Align Llama Stack Responses Paralell tool calls behavior with OpenAI and harden the implementation with tests. | ||
|
|
||
| --- | ||
|
|
||
| ### Connectors | ||
|
|
||
| **Status:** In Progress | ||
|
|
||
| **Issue:** [#4061](https://github.com/llamastack/llama-stack/issues/4061) | ||
|
|
||
| Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp). | ||
|
|
||
| **Open Questions:** | ||
| - Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors? | ||
| - Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API? | ||
|
|
||
| --- | ||
|
|
||
| ### Top Logprobs | ||
|
|
||
| **Status:** In Progress | ||
|
|
||
| **Issue:** [#3552](https://github.com/llamastack/llama-stack/issues/3552) | ||
|
|
||
| The `top_logprobs` parameter from OpenAI's Responses API extends the functionality obtained by including `message.output_text.logprobs` in the `include` parameter list (as discussed in the Include section above). | ||
| It enables users to also get logprobs for alternative tokens. | ||
|
|
||
| --- | ||
|
|
||
| ### Server Side Telemetry | ||
|
|
||
| **Status:** Merged [Planned 0.4.z] | ||
|
|
||
| **Issue:** [#3806](https://github.com/llamastack/llama-stack/issues/3806) | ||
|
|
||
| Support OpenTelemetry as the preferred way to instrument Llama Stack. | ||
|
|
||
| **Remaining Issues:** | ||
| - Some data needs to be converted to follow semantic conventions for OTEL genai data | ||
|
|
||
| --- | ||
|
|
||
| ### Include | ||
|
|
||
| **Status:** Merged [Planned 0.4.z] | ||
|
|
||
| The `include` parameter allows you to provide a list of values that indicate additional information for the system to include in the model response. | ||
| The [OpenAI API](https://platform.openai.com/docs/api-reference/responses/create) specifies the following allowed values for this parameter. | ||
|
|
||
| - `web_search_call.action.sources` | ||
| - `code_interpreter_call.outputs` | ||
| - `computer_call_output.output.image_url` | ||
| - `file_search_call.results` | ||
| - `message.input_image.image_url` | ||
| - `message.output_text.logprobs` | ||
| - `reasoning.encrypted_content` | ||
|
|
||
| This change adds all the following parameters to inputs and output objects for responses and chat completions in Llama Stack for full API compatibility. | ||
| It also implements `message.output_text.logprobs`, allowing users to get logprobs outputs from their inferencing requests. *Note* that the other fields are | ||
| just stubs for now, and will not work when invoked since Llama Stack does not yet support built in tools. | ||
|
||
|
|
||
| --- | ||
|
|
||
| ### Max Tool Calls | ||
|
|
||
| There are reports that `parallel_tool_calls` may not work correctly. This needs verification and a ticket should be opened if confirmed. | ||
| **Status:** Merged [Planned 0.4.z] | ||
|
|
||
| **Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563) | ||
|
|
||
| The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response. | ||
|
|
||
| --- | ||
|
|
||
| ## Resolved Issues | ||
| ### Metadata | ||
|
|
||
| **Status:** Merged [Planned 0.4.z] | ||
|
|
||
| **Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564) | ||
|
|
||
| Metadata allows you to attach additional information to a response for your own reference and tracking. | ||
|
|
||
| --- | ||
|
|
||
| ### Tool Choice | ||
|
|
||
| **Status:** Merged [Planned 0.4.z] | ||
|
|
||
| **Issue:** [#3548](https://github.com/llamastack/llama-stack/issues/3548) | ||
|
|
||
| In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or requirements for which tools should be used when generating a response. | ||
|
|
||
| --- | ||
|
|
||
| ## Fixed | ||
|
|
||
| The following limitations have been addressed in recent releases: | ||
|
|
||
|
|
@@ -297,3 +301,34 @@ The `require_approval` parameter for MCP tools in the Responses API now works co | |
| **Fixed in:** [#3003](https://github.com/llamastack/llama-stack/pull/3003) (Agent API), [#3602](https://github.com/llamastack/llama-stack/pull/3602) (Responses API) | ||
|
|
||
| MCP tools now correctly handle array-type arguments in both the Agent API and Responses API. | ||
|
|
||
| --- | ||
|
|
||
| ### Streaming | ||
|
|
||
| **Status:** ✅ Resolved | ||
|
|
||
| **Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364) | ||
|
|
||
| Streaming functionality for the Responses API is feature complete and released. | ||
|
|
||
| --- | ||
|
|
||
| ### Prompt Templates | ||
|
|
||
| **Status:** ✅ Resolved | ||
|
|
||
| **Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321) | ||
|
|
||
| OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing. | ||
|
|
||
| --- | ||
|
|
||
| ### Instructions | ||
| **Status:** ✅ Resolved | ||
|
|
||
| **Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566) | ||
|
|
||
| The Responses API request and response object now supports the *instructions* field. | ||
|
|
||
| --- | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems confusing now. I think it would make sense to split this into the parts that are still open (filters and user location) that stay here, and the parts that are now done (the aliases) in the Resolved Issues section, which I think requires some rewriting of both.