feat: Pass parallel_tool_calls correctly #4196

anastasds · 2025-11-19T19:34:43Z

What does this PR do?

A set of changes were introduced by the pre-commit hook. The relevant change here is in streaming.py.

Test Plan

Manual verification.

mergify · 2025-11-19T19:35:34Z

This pull request has merge conflicts that must be resolved before it can be merged. @anastasds please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…gration test Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

ashwinb · 2025-11-19T19:54:22Z

I think this PR is a bit too small -- as in, we could do more here?

anastasds · 2025-11-20T13:41:19Z

@ashwinb It's a one line change, yes, but it fixes broken behavior. As a relative newcomer to the codebase, it took some time to figure out where the problem was - for example, grepping the codebase for "call_ led me down the wrong path for a while.

What do you have in mind for "more"? I originally had added an integration test for this after the max_tool_calls integration test for the responses API, only to find after pulling main that it had been removed. I would be happy to add it back in, but I don't know whether it will be a reliable test given that it has to do with tool calling, which, as far as I know, is known to be unreliable for anything less than the largest models.

…now been implemented Removed section on rumored issue with parallel tool calls.

Added clarification on the behavior of the `parallel_tool_calls` parameter and its impact on function calling workflows.

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

github-actions · 2025-12-16T16:28:07Z

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

feat: Pass parallel_tool_calls correctly

Edit this comment to update it. It will appear in the SDK's changelogs.

✅ llama-stack-client-node studio · code · diff

Your SDK built successfully.
generate ⚠️ → build ✅ → lint ✅ → test ✅
npm install https://pkg.stainless.com/s/llama-stack-client-node/c28750f93fe20a16f34960823b3b8948902741bb/dist.tar.gz

⏳ llama-stack-client-kotlin studio · code · diff

generate ⚠️ → lint ✅ → test ⏳

✅ llama-stack-client-go studio · code · diff

Your SDK built successfully.
generate ❗ → lint ❗ → test ❗
go get github.com/stainless-sdks/llama-stack-client-go@db46c3db06b5a3bef7ecf59b6c0d4c7e792d5e82

⚡ llama-stack-client-python studio · conflict

There was a conflict between your custom code and your generated changes.
You don't need to resolve this conflict right now, but you will need to resolve it for your changes to be released to your users. Read more about why this happened here.

⏳ These are partial results; builds are still running.

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2025-12-16 22:18:44 UTC

anastasds · 2025-12-16T16:30:40Z

@franciscojavierarceo Ah CONTRIBUTING.md is out of date and pins to an older version, thanks for the tip. Fixed.

mattf · 2025-12-16T16:31:52Z

src/llama_stack_api/openai_responses.py

    object: Literal["response"] = "response"
    output: Sequence[OpenAIResponseOutput]
-    parallel_tool_calls: bool | None = True
+    parallel_tool_calls: bool | None = None


why the api change?

This parameter is optional and so when being passed downstream, should not be set unless set by the user in their initial request. If a downstream provider defaults to False, that should not be overriden by default.

OpenAI currently defaults to true internally, but that may change, the parameter may be removed, etc., so best to not explicitly set it. (I originally defined this to be True by default and should not have.)

whitespace fixed.

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

mattf

it's optional w/ a default of true.

if a downstream provider defaults to false, a user will need to know that downstream provider is used to understand how their program is working. if they switch to an implementation that has a different downstream provider with a different default they'll need to understand that and adjust their program.

please don't change this api.

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

anastasds · 2025-12-17T13:57:22Z

@mattf Since the parameter wasn't being passed downstream, then it looks like I'll need to re-record replays. Looking at the readme in the integration tests directory, it looks like I'll have to re-record all the failing tests since the error messages are about missing record hashes. Do I understand correctly?

api change addressed

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

anastasds · 2025-12-17T15:05:01Z

@mattf From researching the failures after recording this missing replays, I see that not all models support parallel_tool_calls, such as per this answer on the OpenAI forums. There are various failures in the logs now, including 400 failures that look like they correspond to models that don't support this parameter. Testing of this parameter with true, false, and unset against a supported OpenAI model does succeed.

With that in mind, perhaps indeed it is best to not set this parameter unless the end user has set it?

From the point of view of maximizing ease of migration from OpenAI to LLamaStack, ostensibly one would first move to LlamaStack using the OpenAI provider to verify functionality before moving to a different provider, so what I'm suggesting seems to not contradict that goal.

mattf · 2025-12-18T13:10:20Z

From researching the failures after recording this missing replays, I see that not all models support parallel_tool_calls, such as per this answer on the OpenAI forums. There are various failures in the logs now, including 400 failures that look like they correspond to models that don't support this parameter. Testing of this parameter with true, false, and unset against a supported OpenAI model does succeed.

With that in mind, perhaps indeed it is best to not set this parameter unless the end user has set it?

From the point of view of maximizing ease of migration from OpenAI to LLamaStack, ostensibly one would first move to LlamaStack using the OpenAI provider to verify functionality before moving to a different provider, so what I'm suggesting seems to not contradict that goal.

sounds like you've uncovered a bug in an inference provider. please share a link to the error in the test run.

anastasds · 2025-12-18T14:35:25Z

Lots of 500 errors against gpt-4o which does not support this parameter, such as here, here, here. That seems to be the majority of the cases.

I think that it probably does not make sense to insert parallel_tool_calls=true into the request passed to a provider if the user has not set that parameter at all. As a user, I think I probably would not want my inference routing middleware inserting parameters that I may not understand. Especially since there are prominent models that don't support this parameter at all, I would still advocate for defaulting this parameter to be unset (None) instead of forcibly adding it with value True when the user has not set it it themselves. Again as a user, I certainly would not want to have to debug an issue with my middleware inserting parameters that are not supported by the model I am trying to use.

There are also three failing suites that are due to tests that look for the string hello in the model response, which now fail because the model responded successfully but without saying hello - see here, here, and here - not sure what to do about that as it seems to be a fragile test. Would one find the identify the corresponding replays, delete, and re-record until the model says something with "hello" in it? Manually modify the replay to pretend the model said "hello", considering the variability in LLM responses?

mattf · 2025-12-18T14:46:09Z

@anastasds the 500 errors are opaque. what's the underling issue?

we should not manually edit the recordings. that'll just mean the next person has to figure out you edited them and make similar manual changes.

we should not be adjusting our stable public api to workaround a provider bug.

anastasds · 2025-12-18T15:37:27Z

@mattf It's not a bug. Some models don't support this parameter. The 500s look like they're all against a model that doesn't support this parameter. Short of maintaining a list of which models do and do not support this parameter, I don't see a better option.

I would also reiterate:

I think that it probably does not make sense to insert parallel_tool_calls=true into the request passed to a provider if the user has not set that parameter at all. As a user, I think I probably would not want my inference routing middleware inserting parameters that I may not understand. Especially since there are prominent models that don't support this parameter at all, I would still advocate for defaulting this parameter to be unset (None) instead of forcibly adding it with value True when the user has not set it it themselves. Again as a user, I certainly would not want to have to debug an issue with my middleware inserting parameters that are not supported by the model I am trying to use.

anastasds · 2025-12-18T15:39:45Z

Perhaps a reasonable path forward would be to not set this parameter for now and plan out some work to provide the supporting infrastructure for multi-turn tool calling (the parallel_tool_calls=false) in LlamaStack itself, which would be nontrivial. Passing the parameter forward when set, as this PR implements, would provide partial compatibility with models that support it, in the meantime.

mattf · 2025-12-18T19:25:45Z

@anastasds please, what error is the stack server receiving that results in the 500?

until we have that, we're just speculating.

parallel_tool_calls is a hint that the caller can handle multiple tools calls in a single response. there's no guarantee that the model will produce multiple tool calls in a single response. even a model trained to return multiple calls may still only return a single call when it could more efficiently return multiple.

if an inference engine is throwing an error because it expects the model will never produce parallel calls, that's a bug in the engine. the engine should instead ignore the hint. for instance, if ollama is turning a suggestion (call multiple tools at the same time if you want) into a requirement (model, you must return parallel tool calls). a bug should be raised with ollama and we can disable parallel tool calls for ollama in the meantime.

however, without knowing the underlying issue, it might also be that the engine is returning parallel calls and stack's agent loop doesn't know how to handle them. that'd be a good situation because we can simply fix that.

s-akhtar-baig · 2026-01-08T21:59:58Z

@mattf, thank you for reviewing! I will be creating a follow-up PR to move this forward and any other changes needed to fix the reported bug in 4430.

anastasds requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1 and yanxi0830 as code owners November 19, 2025 19:34

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 19, 2025

mergify bot added the needs-rebase label Nov 19, 2025

Pass parallel_tool_calls directly and document intended usage in inte…

958d0dc

…gration test Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

anastasds force-pushed the parallel-tool-calls-impl branch from 1dd3d09 to 958d0dc Compare November 19, 2025 19:37

Merge branch 'main' into parallel-tool-calls-impl

179d5f7

mergify bot removed the needs-rebase label Nov 19, 2025

anastasds changed the title ~~Pass parallel_tool_calls directly and document intended usage in integration test~~ Pass parallel_tool_calls correctly Nov 19, 2025

anastasds changed the title ~~Pass parallel_tool_calls correctly~~ feat: Pass parallel_tool_calls correctly Nov 19, 2025

Merge branch 'main' into parallel-tool-calls-impl

0c2b82b

anastasds mentioned this pull request Nov 21, 2025

docs: Delete parallel tool calls section from known limitations documentation because it has now been implemented #4207

Closed

Merge branch 'main' into parallel-tool-calls-impl

9cbb624

anastasds requested a review from cdoern as a code owner December 16, 2025 15:48

anastasds added 4 commits December 16, 2025 10:55

Delete parallel tool calls section from documentation because it has …

1f6e095

…now been implemented Removed section on rumored issue with parallel tool calls.

Document behavior of parallel_tool_calls parameter

a971d8f

Added clarification on the behavior of the `parallel_tool_calls` parameter and its impact on function calling workflows.

Run pre-commit hooks

bbd59b7

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

Vendor images for parallel_tool_calls docs

a6552c0

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

mattf reviewed Dec 16, 2025

View reviewed changes

anastasds added 7 commits December 16, 2025 13:19

Merge branch 'main' into parallel-tool-calls-impl

7fda120

Run latest pre-commit hooks

d3acf47

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

Merge branch 'main' into parallel-tool-calls-impl

d14d930

Default parallel_tool_calls to None everywhere

e2030f7

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

Rerun pre commit hooks

983fa48

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

Merge branch 'main' into parallel-tool-calls-impl

5214ccf

Merge branch 'pdocs' into parallel-tool-calls-impl

271af12

anastasds closed this Dec 16, 2025

anastasds reopened this Dec 16, 2025

mattf previously requested changes Dec 17, 2025

View reviewed changes

Return parallel_tool_calls default to true

2610b4f

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

Add missing replays for integration tests

8a924a3

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>

anastasds force-pushed the parallel-tool-calls-impl branch from 796c4e4 to 8a924a3 Compare December 17, 2025 14:52

Merge branch 'main' into parallel-tool-calls-impl

d7f8d6b

Merge branch 'main' into parallel-tool-calls-impl

5514378

anastasds closed this Jan 2, 2026

feat: Pass parallel_tool_calls correctly #4196

feat: Pass parallel_tool_calls correctly #4196

Uh oh!

Conversation

anastasds commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

mergify bot commented Nov 19, 2025

Uh oh!

ashwinb commented Nov 19, 2025

Uh oh!

anastasds commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

anastasds commented Dec 16, 2025

Uh oh!

mattf Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

anastasds Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

anastasds commented Dec 17, 2025

Uh oh!

anastasds commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattf commented Dec 18, 2025

Uh oh!

anastasds commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattf commented Dec 18, 2025

Uh oh!

anastasds commented Dec 18, 2025

Uh oh!

anastasds commented Dec 18, 2025

Uh oh!

mattf commented Dec 18, 2025

Uh oh!

s-akhtar-baig commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

anastasds commented Nov 19, 2025 •

edited

Loading

anastasds commented Nov 20, 2025 •

edited

Loading

github-actions bot commented Dec 16, 2025 •

edited

Loading

anastasds commented Dec 17, 2025 •

edited

Loading

anastasds commented Dec 18, 2025 •

edited

Loading