Athina is building monitoring and evaluation tools for LLM developers.
- Evals SDK: Open-source framework for evaluating LLMs (Python + CLI)
 - Platform: Monitor your production inferences, and automatically run evals
 
Documentation | Quick Start | Running Evals
We have a library of preset evaluators, but you can also write custom evaluators within the Athina framework.
- Context Contains Enough Information: Detect bad or insufficient retrievals.
 - Does Response Answer Query: Detect incomplete or irrelevant responses.
 - Response Faithfulness: Detect when responses are deviating from the provided context.
 - Summarization Accuracy: Detect hallucinations and mistakes in summaries
 - Grading Criteria: If X, then fail. Otherwise pass.
 - Custom Evals: Custom prompt for LLM-powered evaluation.
 - RAGAS: A set of evaluators that return RAGAS metrics.
 
Results can also be viewed and tracked on our platform.

Documentation | Demo Video | Sign Up
- UI for monitoring and visibility into your LLM inferences.
 - Run evals automatically against logged inferences in production.
 - Track cost, token usage, response times, feedback, pass rate and other eval metrics.
 - Analytics segmented by Customer ID, Model, Prompt, Environment, and More.
 - Topic Classification
 - Data Exports
 - ... and more
 
Contact [email protected] if you have any questions.
