Models Tool Call Evaluation #163861
Replies: 2 comments
-
| 💬 Your Product Feedback Has Been Submitted 🎉 Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users. Here's what you can expect moving forward ⏩ 
 Where to look to see what's shipping 👀 
 What you can do in the meantime 💻 
 As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities. Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐ | 
Beta Was this translation helpful? Give feedback.
-
| Evaluating LLM Tool Calling is crucial as it's a critical message an LLM emits. Evaluation should cover: Parameter values: Did the LLM use the correct values? Tool selection: Did the LLM choose the right tool from multiple options? Multi-turn tool calling: How accurate is the LLM in complex conversational scenarios? | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Select Topic Area
Product Feedback
Body
One piece of functionality that is used heavily with LLMs but that does not have good evals today is tool calling. At the end of the day a tool call is just another message that an LLM emits so it should be possible to evaluate that message just like any other chat message to verify things like:
Beta Was this translation helpful? Give feedback.
All reactions