Skip to content

BFCL metric bug fix#30

Open
aman-servicenow wants to merge 1 commit intomainfrom
bfcl-metric-fix
Open

BFCL metric bug fix#30
aman-servicenow wants to merge 1 commit intomainfrom
bfcl-metric-fix

Conversation

@aman-servicenow
Copy link
Copy Markdown
Collaborator

Fix: Scalar param and required param check bug

📌 Description

🔗 Related Issue(s)

🛠️ Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality including new tasks)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor / Code cleanup
  • Maintenance / Chore / Task
  • Other (please describe):

✅ How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Manual testing

Test Results / Screenshots (if applicable):

📸 Screenshots / Demos

📋 Checklist

  • Code follows project style guidelines
  • Tests have been added/updated (if applicable)
  • Documentation has been updated (if applicable)
  • Linked relevant issue(s)
  • Self-reviewed my code

🙌 Additional Notes

Fixes issues in BFCL metric:

  1. Scalar parameter values are never validated

The _compare_tool_call method only checks values for dict and list typed parameters. For scalar types (string, integer, float, boolean), it only verifies the parameter exists — never comparing the predicted value against the ground truth. Since the vast majority of BFCL parameters are scalars, a model outputting base=999 when the answer is base=10 would still score as correct. In the standard BFCL evaluator, simple_function_checker has a catch-all value not in possible_answer[param] that correctly rejects wrong values for all types. Note that simple_function_checker is the core per-call validator used by all categories — multiple_function_checker and parallel_function_checker_no_order both delegate to it - so this affects simple, multiple, parallel, and parallel_multiple equally.

  1. Only required parameters are checked
    The metric loop iterates only over required_params from the tool schema. Optional parameters with expected values in the ground truth (e.g., "unit": ["units", ""]) are never validated. In contrast, standard BFCL's simple_function_checker iterates over all model_params.items() and also checks for missing optional parameters that are not marked as optional in the ground truth. Note that simple_function_checker is the core per-call validator used by all categories - multiple_function_checker and parallel_function_checker_no_order both delegate to it - so this affects simple, multiple, parallel, and parallel_multiple equally.

Fix: Scalar param and required param check bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant