-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Pass parallel_tool_calls correctly #4196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request has merge conflicts that must be resolved before it can be merged. @anastasds please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
…gration test Signed-off-by: Anastas Stoyanovsky <[email protected]>
1dd3d09 to
958d0dc
Compare
|
I think this PR is a bit too small -- as in, we could do more here? |
|
@ashwinb It's a one line change, yes, but it fixes broken behavior. As a relative newcomer to the codebase, it took some time to figure out where the problem was - for example, grepping the codebase for What do you have in mind for "more"? I originally had added an integration test for this after the |
…now been implemented Removed section on rumored issue with parallel tool calls.
Added clarification on the behavior of the `parallel_tool_calls` parameter and its impact on function calling workflows.
Signed-off-by: Anastas Stoyanovsky <[email protected]>
Signed-off-by: Anastas Stoyanovsky <[email protected]>
Signed-off-by: Anastas Stoyanovsky <[email protected]>
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-node studio · code · diff
✅ llama-stack-client-go studio · code · diff
⚡ llama-stack-client-python studio · conflict
⏳ These are partial results; builds are still running. This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
|
@franciscojavierarceo Ah |
| object: Literal["response"] = "response" | ||
| output: Sequence[OpenAIResponseOutput] | ||
| parallel_tool_calls: bool | None = True | ||
| parallel_tool_calls: bool | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the api change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter is optional and so when being passed downstream, should not be set unless set by the user in their initial request. If a downstream provider defaults to False, that should not be overriden by default.
OpenAI currently defaults to true internally, but that may change, the parameter may be removed, etc., so best to not explicitly set it. (I originally defined this to be True by default and should not have.)
Signed-off-by: Anastas Stoyanovsky <[email protected]>
Signed-off-by: Anastas Stoyanovsky <[email protected]>
Signed-off-by: Anastas Stoyanovsky <[email protected]>
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's optional w/ a default of true.
if a downstream provider defaults to false, a user will need to know that downstream provider is used to understand how their program is working. if they switch to an implementation that has a different downstream provider with a different default they'll need to understand that and adjust their program.
please don't change this api.
Signed-off-by: Anastas Stoyanovsky <[email protected]>
|
@mattf Since the parameter wasn't being passed downstream, then it looks like I'll need to re-record replays. Looking at the readme in the integration tests directory, it looks like I'll have to re-record all the failing tests since the error messages are about missing record hashes. Do I understand correctly? |
Signed-off-by: Anastas Stoyanovsky <[email protected]>
796c4e4 to
8a924a3
Compare
|
@mattf From researching the failures after recording this missing replays, I see that not all models support With that in mind, perhaps indeed it is best to not set this parameter unless the end user has set it? From the point of view of maximizing ease of migration from OpenAI to LLamaStack, ostensibly one would first move to LlamaStack using the OpenAI provider to verify functionality before moving to a different provider, so what I'm suggesting seems to not contradict that goal. |
sounds like you've uncovered a bug in an inference provider. please share a link to the error in the test run. |
|
Lots of 500 errors against gpt-4o which does not support this parameter, such as here, here, here. That seems to be the majority of the cases. I think that it probably does not make sense to insert There are also three failing suites that are due to tests that look for the string |
|
@anastasds the 500 errors are opaque. what's the underling issue? we should not manually edit the recordings. that'll just mean the next person has to figure out you edited them and make similar manual changes. we should not be adjusting our stable public api to workaround a provider bug. |
|
@mattf It's not a bug. Some models don't support this parameter. The 500s look like they're all against a model that doesn't support this parameter. Short of maintaining a list of which models do and do not support this parameter, I don't see a better option. I would also reiterate:
|
|
Perhaps a reasonable path forward would be to not set this parameter for now and plan out some work to provide the supporting infrastructure for multi-turn tool calling (the |
|
@anastasds please, what error is the stack server receiving that results in the 500? until we have that, we're just speculating.
if an inference engine is throwing an error because it expects the model will never produce parallel calls, that's a bug in the engine. the engine should instead ignore the hint. for instance, if ollama is turning a suggestion (call multiple tools at the same time if you want) into a requirement (model, you must return parallel tool calls). a bug should be raised with ollama and we can disable parallel tool calls for ollama in the meantime. however, without knowing the underlying issue, it might also be that the engine is returning parallel calls and stack's agent loop doesn't know how to handle them. that'd be a good situation because we can simply fix that. |
What does this PR do?
Closes #4123
A set of changes were introduced by the pre-commit hook. The relevant change here is in
streaming.py.Test Plan
Manual verification.