Skip to content

Conversation

@anastasds
Copy link
Contributor

@anastasds anastasds commented Nov 19, 2025

What does this PR do?

Closes #4123

A set of changes were introduced by the pre-commit hook. The relevant change here is in streaming.py.

Test Plan

Manual verification.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 19, 2025
@mergify
Copy link

mergify bot commented Nov 19, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @anastasds please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 19, 2025
@anastasds anastasds force-pushed the parallel-tool-calls-impl branch from 1dd3d09 to 958d0dc Compare November 19, 2025 19:37
@mergify mergify bot removed the needs-rebase label Nov 19, 2025
@anastasds anastasds changed the title Pass parallel_tool_calls directly and document intended usage in integration test Pass parallel_tool_calls correctly Nov 19, 2025
@anastasds anastasds changed the title Pass parallel_tool_calls correctly feat: Pass parallel_tool_calls correctly Nov 19, 2025
@ashwinb
Copy link
Contributor

ashwinb commented Nov 19, 2025

I think this PR is a bit too small -- as in, we could do more here?

@anastasds
Copy link
Contributor Author

anastasds commented Nov 20, 2025

@ashwinb It's a one line change, yes, but it fixes broken behavior. As a relative newcomer to the codebase, it took some time to figure out where the problem was - for example, grepping the codebase for "call_ led me down the wrong path for a while.

What do you have in mind for "more"? I originally had added an integration test for this after the max_tool_calls integration test for the responses API, only to find after pulling main that it had been removed. I would be happy to add it back in, but I don't know whether it will be a reliable test given that it has to do with tool calling, which, as far as I know, is known to be unreliable for anything less than the largest models.

@anastasds anastasds requested a review from cdoern as a code owner December 16, 2025 15:48
…now been implemented

Removed section on rumored issue with parallel tool calls.
Added clarification on the behavior of the `parallel_tool_calls` parameter and its impact on function calling workflows.
Signed-off-by: Anastas Stoyanovsky <[email protected]>
Signed-off-by: Anastas Stoyanovsky <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 16, 2025

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

feat: Pass parallel_tool_calls correctly

Edit this comment to update it. It will appear in the SDK's changelogs.

llama-stack-client-node studio · code · diff

Your SDK built successfully.
generate ⚠️build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/llama-stack-client-node/c28750f93fe20a16f34960823b3b8948902741bb/dist.tar.gz
llama-stack-client-kotlin studio · code · diff

generate ⚠️lint ✅test ⏳

llama-stack-client-go studio · code · diff

Your SDK built successfully.
generate ❗lint ❗test ❗

go get github.com/stainless-sdks/llama-stack-client-go@db46c3db06b5a3bef7ecf59b6c0d4c7e792d5e82
llama-stack-client-python studio · conflict

There was a conflict between your custom code and your generated changes.
You don't need to resolve this conflict right now, but you will need to resolve it for your changes to be released to your users. Read more about why this happened here.

⏳ These are partial results; builds are still running.


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2025-12-16 22:18:44 UTC

@anastasds
Copy link
Contributor Author

@franciscojavierarceo Ah CONTRIBUTING.md is out of date and pins to an older version, thanks for the tip. Fixed.

object: Literal["response"] = "response"
output: Sequence[OpenAIResponseOutput]
parallel_tool_calls: bool | None = True
parallel_tool_calls: bool | None = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the api change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter is optional and so when being passed downstream, should not be set unless set by the user in their initial request. If a downstream provider defaults to False, that should not be overriden by default.

OpenAI currently defaults to true internally, but that may change, the parameter may be removed, etc., so best to not explicitly set it. (I originally defined this to be True by default and should not have.)

@mattf mattf dismissed their stale review December 16, 2025 16:32

whitespace fixed.

@anastasds anastasds closed this Dec 16, 2025
@anastasds anastasds reopened this Dec 16, 2025
mattf
mattf previously requested changes Dec 17, 2025
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's optional w/ a default of true.

if a downstream provider defaults to false, a user will need to know that downstream provider is used to understand how their program is working. if they switch to an implementation that has a different downstream provider with a different default they'll need to understand that and adjust their program.

please don't change this api.

@anastasds
Copy link
Contributor Author

@mattf Since the parameter wasn't being passed downstream, then it looks like I'll need to re-record replays. Looking at the readme in the integration tests directory, it looks like I'll have to re-record all the failing tests since the error messages are about missing record hashes. Do I understand correctly?

@mattf mattf dismissed their stale review December 17, 2025 14:04

api change addressed

@anastasds anastasds force-pushed the parallel-tool-calls-impl branch from 796c4e4 to 8a924a3 Compare December 17, 2025 14:52
@anastasds
Copy link
Contributor Author

anastasds commented Dec 17, 2025

@mattf From researching the failures after recording this missing replays, I see that not all models support parallel_tool_calls, such as per this answer on the OpenAI forums. There are various failures in the logs now, including 400 failures that look like they correspond to models that don't support this parameter. Testing of this parameter with true, false, and unset against a supported OpenAI model does succeed.

With that in mind, perhaps indeed it is best to not set this parameter unless the end user has set it?

From the point of view of maximizing ease of migration from OpenAI to LLamaStack, ostensibly one would first move to LlamaStack using the OpenAI provider to verify functionality before moving to a different provider, so what I'm suggesting seems to not contradict that goal.

@mattf
Copy link
Collaborator

mattf commented Dec 18, 2025

From researching the failures after recording this missing replays, I see that not all models support parallel_tool_calls, such as per this answer on the OpenAI forums. There are various failures in the logs now, including 400 failures that look like they correspond to models that don't support this parameter. Testing of this parameter with true, false, and unset against a supported OpenAI model does succeed.

With that in mind, perhaps indeed it is best to not set this parameter unless the end user has set it?

From the point of view of maximizing ease of migration from OpenAI to LLamaStack, ostensibly one would first move to LlamaStack using the OpenAI provider to verify functionality before moving to a different provider, so what I'm suggesting seems to not contradict that goal.

sounds like you've uncovered a bug in an inference provider. please share a link to the error in the test run.

@anastasds
Copy link
Contributor Author

anastasds commented Dec 18, 2025

Lots of 500 errors against gpt-4o which does not support this parameter, such as here, here, here. That seems to be the majority of the cases.

I think that it probably does not make sense to insert parallel_tool_calls=true into the request passed to a provider if the user has not set that parameter at all. As a user, I think I probably would not want my inference routing middleware inserting parameters that I may not understand. Especially since there are prominent models that don't support this parameter at all, I would still advocate for defaulting this parameter to be unset (None) instead of forcibly adding it with value True when the user has not set it it themselves. Again as a user, I certainly would not want to have to debug an issue with my middleware inserting parameters that are not supported by the model I am trying to use.


There are also three failing suites that are due to tests that look for the string hello in the model response, which now fail because the model responded successfully but without saying hello - see here, here, and here - not sure what to do about that as it seems to be a fragile test. Would one find the identify the corresponding replays, delete, and re-record until the model says something with "hello" in it? Manually modify the replay to pretend the model said "hello", considering the variability in LLM responses?

@mattf
Copy link
Collaborator

mattf commented Dec 18, 2025

@anastasds the 500 errors are opaque. what's the underling issue?

we should not manually edit the recordings. that'll just mean the next person has to figure out you edited them and make similar manual changes.

we should not be adjusting our stable public api to workaround a provider bug.

@anastasds
Copy link
Contributor Author

@mattf It's not a bug. Some models don't support this parameter. The 500s look like they're all against a model that doesn't support this parameter. Short of maintaining a list of which models do and do not support this parameter, I don't see a better option.

I would also reiterate:

I think that it probably does not make sense to insert parallel_tool_calls=true into the request passed to a provider if the user has not set that parameter at all. As a user, I think I probably would not want my inference routing middleware inserting parameters that I may not understand. Especially since there are prominent models that don't support this parameter at all, I would still advocate for defaulting this parameter to be unset (None) instead of forcibly adding it with value True when the user has not set it it themselves. Again as a user, I certainly would not want to have to debug an issue with my middleware inserting parameters that are not supported by the model I am trying to use.

@anastasds
Copy link
Contributor Author

Perhaps a reasonable path forward would be to not set this parameter for now and plan out some work to provide the supporting infrastructure for multi-turn tool calling (the parallel_tool_calls=false) in LlamaStack itself, which would be nontrivial. Passing the parameter forward when set, as this PR implements, would provide partial compatibility with models that support it, in the meantime.

@mattf
Copy link
Collaborator

mattf commented Dec 18, 2025

@anastasds please, what error is the stack server receiving that results in the 500?

until we have that, we're just speculating.

parallel_tool_calls is a hint that the caller can handle multiple tools calls in a single response. there's no guarantee that the model will produce multiple tool calls in a single response. even a model trained to return multiple calls may still only return a single call when it could more efficiently return multiple.

if an inference engine is throwing an error because it expects the model will never produce parallel calls, that's a bug in the engine. the engine should instead ignore the hint. for instance, if ollama is turning a suggestion (call multiple tools at the same time if you want) into a requirement (model, you must return parallel tool calls). a bug should be raised with ollama and we can disable parallel tool calls for ollama in the meantime.

however, without knowing the underlying issue, it might also be that the engine is returning parallel calls and stack's agent loop doesn't know how to handle them. that'd be a good situation because we can simply fix that.

@anastasds anastasds closed this Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Implement parallel_tool_calls in Responses API

4 participants