Skip to content

Conversation

@mattbit
Copy link
Member

@mattbit mattbit commented Oct 30, 2023

  • Clean up duplicate LLM tests
  • Add clear documentation to each test
  • Standardize the LLM test names
  • Improve naming of the generated datasets

TODO

  • Check if test datasets are duplicated when running with debug = False
  • Add tests for tests

@mattbit
Copy link
Member Author

mattbit commented Oct 30, 2023

Depends on #1513

@mattbit mattbit changed the title LLM tests refactoring [GSK-2015,GSK-2012,GSK-1998] LLM tests refactoring Oct 30, 2023
@linear
Copy link

linear bot commented Oct 30, 2023

@mattbit mattbit marked this pull request as draft October 30, 2023 14:58
@mattbit mattbit marked this pull request as ready for review October 30, 2023 15:50
Copy link
Contributor

@rabah-khalek rabah-khalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, minor comments

@rabah-khalek
Copy link
Contributor

rabah-khalek commented Oct 30, 2023

you're missing the check of the debug bool, It should be False by default, and checked inside the test. If it's true, the output_ds should be filled

@mattbit
Copy link
Member Author

mattbit commented Oct 30, 2023

you're missing the check of the debug bool, It should be False by default, and checked inside the test. If it's true, the output_ds should be filled

Since I have the dataset, I just return it regardless of whether debug is set or not. I don’t see the point in not setting it since I already have it. Also since the LLM-based tests are not deterministic, it’s best to have the debug dataset from the specific run (without having to run again with debug=True, which will give different results).

@rabah-khalek
Copy link
Contributor

I'm not sure this is well defined. The standard procedure today is that once you click debug in the hub, you re-run the test with debug=True, you save the output_ds.

Not sure if the dataset is persisted only when the debug is clicked, or whether by always providing the dataset, the dataset gets saved twice.

But in any case, even with no check on debug, the debug click will produce a new dataset either way.

@rabah-khalek
Copy link
Contributor

In fact the mechanism of debug is also going to change with this https://github.com/Giskard-AI/giskard-hub/pull/172/files, where only the list of indices are now transfer to the backend, and the dataset gets retrieved and only filtered, not created...

@mattbit
Copy link
Member Author

mattbit commented Oct 30, 2023

In fact the mechanism of debug is also going to change with this https://github.com/Giskard-AI/giskard-hub/pull/172/files, where only the list of indices are now transfer to the backend, and the dataset gets retrieved and only filtered, not created...

Nice! That’s a significant improvement, which also avoids polluting the project artifacts with dozens of temporary datasets.

@mattbit
Copy link
Member Author

mattbit commented Oct 30, 2023

Not sure if the dataset is persisted only when the debug is clicked, or whether by always providing the dataset, the dataset gets saved twice.

I will double check. It doesn’t seems it’s saved in the tests I’ve done, but I may have missed that.

@mattbit mattbit self-assigned this Oct 30, 2023
@mattbit mattbit added the scan Created by Linear-GitHub Sync label Oct 30, 2023
@mattbit mattbit force-pushed the task/llm-tests-refactoring branch from 6971434 to a72c2bc Compare October 31, 2023 10:37
@mattbit
Copy link
Member Author

mattbit commented Oct 31, 2023

@rabah-khalek I fixed a problem with the single value test and added a couple of unit tests. The debug output_df is not a problem (does not create duplicates).

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

92.1% 92.1% Coverage
0.0% 0.0% Duplication

@rabah-khalek rabah-khalek merged commit ed38306 into main Oct 31, 2023
@rabah-khalek rabah-khalek deleted the task/llm-tests-refactoring branch October 31, 2023 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scan Created by Linear-GitHub Sync

Development

Successfully merging this pull request may close these issues.

3 participants