[GSK-2015,GSK-2012,GSK-1998] LLM tests refactoring #1522

you're missing the check of the debug bool, It should be False by default, and checked inside the test. If it's true, the output_ds should be filled

mattbit · 2023-10-30T21:30:17Z

you're missing the check of the debug bool, It should be False by default, and checked inside the test. If it's true, the output_ds should be filled

Since I have the dataset, I just return it regardless of whether debug is set or not. I don’t see the point in not setting it since I already have it. Also since the LLM-based tests are not deterministic, it’s best to have the debug dataset from the specific run (without having to run again with debug=True, which will give different results).

rabah-khalek · 2023-10-30T21:36:28Z

I'm not sure this is well defined. The standard procedure today is that once you click debug in the hub, you re-run the test with debug=True, you save the output_ds.

Not sure if the dataset is persisted only when the debug is clicked, or whether by always providing the dataset, the dataset gets saved twice.

But in any case, even with no check on debug, the debug click will produce a new dataset either way.

rabah-khalek · 2023-10-30T21:46:26Z

In fact the mechanism of debug is also going to change with this https://github.com/Giskard-AI/giskard-hub/pull/172/files, where only the list of indices are now transfer to the backend, and the dataset gets retrieved and only filtered, not created...

mattbit · 2023-10-30T21:50:01Z

In fact the mechanism of debug is also going to change with this https://github.com/Giskard-AI/giskard-hub/pull/172/files, where only the list of indices are now transfer to the backend, and the dataset gets retrieved and only filtered, not created...

Nice! That’s a significant improvement, which also avoids polluting the project artifacts with dozens of temporary datasets.

mattbit · 2023-10-30T21:51:36Z

Not sure if the dataset is persisted only when the debug is clicked, or whether by always providing the dataset, the dataset gets saved twice.

I will double check. It doesn’t seems it’s saved in the tests I’ve done, but I may have missed that.

- closes GSK-2012 (description of LLM tests) - closes GSK-2015 (cleanup of old tests)

Attempt at fixing GSK-1998

This will be set at the test suite level

mattbit · 2023-10-31T10:51:06Z

@rabah-khalek I fixed a problem with the single value test and added a couple of unit tests. The debug output_df is not a problem (does not create duplicates).

sonarqubecloud · 2023-10-31T11:06:05Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
3 Code Smells

92.1% Coverage
0.0% Duplication

mattbit changed the title ~~LLM tests refactoring~~ [GSK-2015,GSK-2012,GSK-1998] LLM tests refactoring Oct 30, 2023

mattbit marked this pull request as draft October 30, 2023 14:58

mattbit marked this pull request as ready for review October 30, 2023 15:50

mattbit requested review from andreybavt and rabah-khalek October 30, 2023 15:51

mattbit mentioned this pull request Oct 30, 2023

[GSK-2008] Add metric_name in test results #1523

Merged

rabah-khalek approved these changes Oct 30, 2023

View reviewed changes

giskard/utils/display.py Show resolved Hide resolved

giskard/scanner/llm/llm_chars_injection_detector.py Show resolved Hide resolved

giskard/testing/tests/llm/output_requirements.py Show resolved Hide resolved

mattbit self-assigned this Oct 30, 2023

mattbit added the scan Created by Linear-GitHub Sync label Oct 30, 2023

mattbit and others added 9 commits October 31, 2023 11:36

Cleanup tests and add documentation

41ec1ca

- closes GSK-2012 (description of LLM tests) - closes GSK-2015 (cleanup of old tests)

Improve LLM scan & tests dataset names

d155a89

Attempt at fixing GSK-1998

Clean up imports

4c032b4

Remove model parameter from LLM scan tests

fc2cb29

This will be set at the test suite level

Standardize LLM tests naming

85acc5b

Fix test after renaming of AdversarialDataGenerator

677764a

Fix type hint

fe1f602

v2.0.0b29

35033ef

Fix output requirements tests and add tests

a72c2bc

mattbit force-pushed the task/llm-tests-refactoring branch from 6971434 to a72c2bc Compare October 31, 2023 10:37

Merge branch 'main' into task/llm-tests-refactoring

d7a61ad

rabah-khalek approved these changes Oct 31, 2023

View reviewed changes

rabah-khalek merged commit ed38306 into main Oct 31, 2023

rabah-khalek deleted the task/llm-tests-refactoring branch October 31, 2023 11:07

Uh oh!

[GSK-2015,GSK-2012,GSK-1998] LLM tests refactoring #1522

[GSK-2015,GSK-2012,GSK-1998] LLM tests refactoring #1522

Uh oh!

Conversation

mattbit commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattbit commented Oct 30, 2023

Uh oh!

linear bot commented Oct 30, 2023

Uh oh!

rabah-khalek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rabah-khalek commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattbit commented Oct 30, 2023

Uh oh!

rabah-khalek commented Oct 30, 2023

Uh oh!

rabah-khalek commented Oct 30, 2023

Uh oh!

mattbit commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattbit commented Oct 30, 2023

Uh oh!

mattbit commented Oct 31, 2023

Uh oh!

sonarqubecloud bot commented Oct 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

mattbit commented Oct 30, 2023 •

edited

Loading

rabah-khalek commented Oct 30, 2023 •

edited

Loading

mattbit commented Oct 30, 2023 •

edited

Loading