-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
Description
What feature do you want to see added?
Run the LLM evals in GitHub Actions, like the unit tests.
Why do you want this feature? Does it solve a problem?
This would allow us to showcase the robustness of the tool.
How should the feature be implemented?
Similar to how existing tests are run.
Consider how to ensure LLM usage can be monitored and restricted.
Consider how to show failures, given that the LLM evals are (by their nature) flaky.
Additional context (if applicable)
Reactions are currently unavailable