-
-
Notifications
You must be signed in to change notification settings - Fork 377
[GSK-2015,GSK-2012,GSK-1998] LLM tests refactoring #1522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Depends on #1513 |
|
GSK-2012 Add description to tests for LLM scan GSK-1998 Generate ~meaningful and~ short names for the datasets used by the LLM scan
Datasets uploaded used by a test suite generated by scan should have short (<256 chars) and meaningful names. To repoduce: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, minor comments
|
you're missing the check of the |
Since I have the dataset, I just return it regardless of whether |
|
I'm not sure this is well defined. The standard procedure today is that once you click debug in the hub, you re-run the test with debug=True, you save the output_ds. Not sure if the dataset is persisted only when the debug is clicked, or whether by always providing the dataset, the dataset gets saved twice. But in any case, even with no check on debug, the debug click will produce a new dataset either way. |
|
In fact the mechanism of debug is also going to change with this https://github.com/Giskard-AI/giskard-hub/pull/172/files, where only the list of indices are now transfer to the backend, and the dataset gets retrieved and only filtered, not created... |
Nice! That’s a significant improvement, which also avoids polluting the project artifacts with dozens of temporary datasets. |
I will double check. It doesn’t seems it’s saved in the tests I’ve done, but I may have missed that. |
- closes GSK-2012 (description of LLM tests) - closes GSK-2015 (cleanup of old tests)
Attempt at fixing GSK-1998
This will be set at the test suite level
6971434 to
a72c2bc
Compare
|
@rabah-khalek I fixed a problem with the single value test and added a couple of unit tests. The debug output_df is not a problem (does not create duplicates). |
|
Kudos, SonarCloud Quality Gate passed! |








TODO