BFCL metric bug fix by aman-servicenow · Pull Request #30 · ServiceNow/AU-Harness

aman-servicenow · 2026-04-03T20:27:40Z

Fix: Scalar param and required param check bug

📌 Description

🔗 Related Issue(s)

🛠️ Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality including new tasks)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactor / Code cleanup
Maintenance / Chore / Task
Other (please describe):

✅ How Has This Been Tested?

Unit tests
Integration tests
Manual testing

Test Results / Screenshots (if applicable):

📸 Screenshots / Demos

📋 Checklist

Code follows project style guidelines
Tests have been added/updated (if applicable)
Documentation has been updated (if applicable)
Linked relevant issue(s)
Self-reviewed my code

🙌 Additional Notes

Fixes issues in BFCL metric:

Scalar parameter values are never validated

The _compare_tool_call method only checks values for dict and list typed parameters. For scalar types (string, integer, float, boolean), it only verifies the parameter exists — never comparing the predicted value against the ground truth. Since the vast majority of BFCL parameters are scalars, a model outputting base=999 when the answer is base=10 would still score as correct. In the standard BFCL evaluator, simple_function_checker has a catch-all value not in possible_answer[param] that correctly rejects wrong values for all types. Note that simple_function_checker is the core per-call validator used by all categories — multiple_function_checker and parallel_function_checker_no_order both delegate to it - so this affects simple, multiple, parallel, and parallel_multiple equally.

Only required parameters are checked
The metric loop iterates only over required_params from the tool schema. Optional parameters with expected values in the ground truth (e.g., "unit": ["units", ""]) are never validated. In contrast, standard BFCL's simple_function_checker iterates over all model_params.items() and also checks for missing optional parameters that are not marked as optional in the ground truth. Note that simple_function_checker is the core per-call validator used by all categories - multiple_function_checker and parallel_function_checker_no_order both delegate to it - so this affects simple, multiple, parallel, and parallel_multiple equally.

Fix: Scalar param and required param check bug

Update bfcl_metric.py

f201733

Fix: Scalar param and required param check bug

aman-servicenow requested review from akshaykalkunte and nhhoang96 April 3, 2026 20:27

aman-servicenow self-assigned this Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFCL metric bug fix#30

BFCL metric bug fix#30
aman-servicenow wants to merge 1 commit intomainfrom
bfcl-metric-fix

aman-servicenow commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aman-servicenow commented Apr 3, 2026

📌 Description

🔗 Related Issue(s)

🛠️ Type of Change

✅ How Has This Been Tested?

📸 Screenshots / Demos

📋 Checklist

🙌 Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant