Models Tool Call Evaluation #163861

rajveerappan · 2025-06-23T19:54:51Z

rajveerappan
Jun 23, 2025

Select Topic Area

Product Feedback

Body

One piece of functionality that is used heavily with LLMs but that does not have good evals today is tool calling. At the end of the day a tool call is just another message that an LLM emits so it should be possible to evaluate that message just like any other chat message to verify things like:

parameter values - did the llm invoke the tool with the right values?
tool selection - if there are multiple tool choices available, did the LLM invoke the right tool as expected?
multi-turn tool calling - the Berkeley Function Calling benchmark specifically highlights how poor accuracy becomes in a multi-turn setting so evaluations for that would be great

2025-06-23T19:55:13Z

github-actions[bot]
bot Jun 23, 2025

💬 Your Product Feedback Has Been Submitted 🎉

Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users.

Here's what you can expect moving forward ⏩

Your input will be carefully reviewed and cataloged by members of our product teams.
- Due to the high volume of submissions, we may not always be able to provide individual responses.
- Rest assured, your feedback will help chart our course for product improvements.
Other users may engage with your post, sharing their own perspectives or experiences.
GitHub staff may reach out for further clarification or insight.
- We may 'Answer' your discussion if there is a current solution, workaround, or roadmap/changelog post related to the feedback.

Where to look to see what's shipping 👀

Read the Changelog for real-time updates on the latest GitHub features, enhancements, and calls for feedback.
Explore our Product Roadmap, which details upcoming major releases and initiatives.

What you can do in the meantime 💻

Upvote and comment on other user feedback Discussions that resonate with you.
Add more information at any point! Useful details include: use cases, relevant labels, desired outcomes, and any accompanying screenshots.

As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities.

Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐

0 replies

rifialdiif · 2025-07-03T18:49:43Z

rifialdiif
Jul 3, 2025

Evaluating LLM Tool Calling is crucial as it's a critical message an LLM emits. Evaluation should cover:

Parameter values: Did the LLM use the correct values?

Tool selection: Did the LLM choose the right tool from multiple options?

Multi-turn tool calling: How accurate is the LLM in complex conversational scenarios?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Models Tool Call Evaluation #163861

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GitHub Community

Models Tool Call Evaluation #163861

Uh oh!

rajveerappan Jun 23, 2025

Select Topic Area

Body

Replies: 2 comments

Uh oh!

github-actions[bot] bot Jun 23, 2025

Uh oh!

rifialdiif Jul 3, 2025

rajveerappan
Jun 23, 2025

github-actions[bot]
bot Jun 23, 2025

rifialdiif
Jul 3, 2025