-
-
Notifications
You must be signed in to change notification settings - Fork 462
Added tool calls support for LLM clients #1694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
7316d1d
c7eeee2
aaec45b
81d0e40
cfe3401
a59c8ef
7776246
a8874f8
70eac12
daa2059
4f3bfa0
2447cf8
9841ccf
6000e4c
ef2a7a2
2afcf94
67e9fd2
0066a14
ca6c1a3
3b60c31
3908086
abc37aa
3e65e57
dd48192
5227b76
3fcc077
261b8e7
98ae8c7
1f66c50
1ac608d
2639c81
bdecc8b
1687ecc
be657c3
947ed6a
7d6b53d
b91e41e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,22 +9,25 @@ | |
|
|
||
| EVALUATE_MODEL_FUNCTIONS = [ | ||
| { | ||
| "name": "evaluate_model", | ||
| "description": "Evaluates if the model passes the test", | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": { | ||
| "passed_test": { | ||
| "type": "boolean", | ||
| "description": "true if the model successfully passes the test", | ||
| }, | ||
| "reason": { | ||
| "type": "string", | ||
| "description": "optional short description of why the model does not pass the test, in 1 or 2 short sentences", | ||
| "type": "function", | ||
| "function": { | ||
| "name": "evaluate_model", | ||
| "description": "Evaluates if the model passes the test", | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": { | ||
| "passed_test": { | ||
| "type": "boolean", | ||
| "description": "true if the model successfully passes the test", | ||
| }, | ||
| "reason": { | ||
| "type": "string", | ||
| "description": "optional short description of why the model does not pass the test, in 1 or 2 short sentences", | ||
| }, | ||
| }, | ||
| "required": ["passed_test"], | ||
| }, | ||
| }, | ||
| "required": ["passed_test"], | ||
| }, | ||
| ] | ||
|
|
||
|
|
@@ -90,18 +93,18 @@ def evaluate(self, model: BaseModel, dataset: Dataset): | |
| try: | ||
| out = self.llm_client.complete( | ||
| [{"role": "system", "content": prompt}], | ||
| functions=funcs, | ||
| function_call={"name": "evaluate_model"}, | ||
| tools=funcs, | ||
| tool_choice={"type": "function", "function": {"name": "evaluate_model"}}, | ||
| temperature=self.llm_temperature, | ||
| caller_id=self.__class__.__name__, | ||
| ) | ||
| if out.function_call is None or "passed_test" not in out.function_call.args: | ||
| if len(out.tool_calls) != 1 or "passed_test" not in out.tool_calls[0].args: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this specific case, we do not expect, that we can make multiple tool calls, right?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes since we are evaluating a single case we expect it to be evaluated once |
||
| raise LLMGenerationError("Invalid function call arguments received") | ||
| except LLMGenerationError as err: | ||
| errored.append({"message": str(err), "sample": sample}) | ||
| continue | ||
|
|
||
| args = out.function_call.args | ||
| args = out.tool_calls[0].args | ||
| if args["passed_test"]: | ||
| succeeded.append({"input_vars": input_vars, "model_output": model_output, "reason": args.get("reason")}) | ||
| else: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this moment it is ok for a backward compatibility, but I believe, that by the end of the day we need to keep only "tool" and not "function" naming convention and thus related variables like "function_call", as well as we may want change "LLMFunctionCall" to "LLMToolCall"
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm not sure if
function_callare being used in other PR's and might be used by some users having developed there own test.I guess it's okay to keep
LLMFunctionCallfor now since only tool supported for now is of typefunction. I guess we will need to update it once we have other tools supported by OpenAI.