Open
Conversation
Fix: Scalar param and required param check bug
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: Scalar param and required param check bug
📌 Description
🔗 Related Issue(s)
🛠️ Type of Change
✅ How Has This Been Tested?
Test Results / Screenshots (if applicable):
📸 Screenshots / Demos
📋 Checklist
🙌 Additional Notes
Fixes issues in BFCL metric:
The _compare_tool_call method only checks values for dict and list typed parameters. For scalar types (string, integer, float, boolean), it only verifies the parameter exists — never comparing the predicted value against the ground truth. Since the vast majority of BFCL parameters are scalars, a model outputting base=999 when the answer is base=10 would still score as correct. In the standard BFCL evaluator, simple_function_checker has a catch-all value not in possible_answer[param] that correctly rejects wrong values for all types. Note that simple_function_checker is the core per-call validator used by all categories — multiple_function_checker and parallel_function_checker_no_order both delegate to it - so this affects simple, multiple, parallel, and parallel_multiple equally.
The metric loop iterates only over required_params from the tool schema. Optional parameters with expected values in the ground truth (e.g., "unit": ["units", ""]) are never validated. In contrast, standard BFCL's simple_function_checker iterates over all model_params.items() and also checks for missing optional parameters that are not marked as optional in the ground truth. Note that simple_function_checker is the core per-call validator used by all categories - multiple_function_checker and parallel_function_checker_no_order both delegate to it - so this affects simple, multiple, parallel, and parallel_multiple equally.