[GSK-1528] Refactor issue rendering to decouple from templating #1313

henchaves · 2023-08-17T12:25:48Z

Description

The purpose of this PR is to simplify the classes that inherit from the Issue class used for displaying scan results. We aim to make these classes more minimalistic by reducing the number of properties and adding a summary property for better visualization and serialization.

Additionally, this PR combines all templates linked to a single type of issue into a single one called default.html.

Lastly, we have created basic templates with markdown to enable seamless integration of Giskard Scan results with other applications.

Related Issue

GSK-1528 (available on Linear)

📚 Examples / docs / tutorials / dependencies update
🔧 Bug fix (non-breaking change which fixes an issue)
🥂 Improvement (non-breaking change which improves an existing feature)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to change)
🔐 Security fix

linear · 2023-08-17T12:25:52Z

GSK-1528 Refactor issue rendering to decouple from templating

Refactor the issue rendering, moving from the jinja templates to the issue objects. Each issue should provide standardized attributes that we can use to create the issue info that will be used for both HTML templates and markdown, csv export.

rabah-khalek · 2023-08-30T15:13:55Z

Fine about strings.

You can do report.to_markdown("my_report.md") and open in a markdown editor to check.

dumping into file doesn't seem to be supported @mattbit: https://github.com/Giskard-AI/giskard/blob/c5e0f262ccf453e95c7934f2be418b660e61364b/python-client/giskard/scanner/report.py#L55

mattbit · 2023-08-30T15:16:51Z

Fine about strings.

You can do report.to_markdown("my_report.md") and open in a markdown editor to check.

dumping into file doesn't seem to be supported @mattbit:

https://github.com/Giskard-AI/giskard/blob/c5e0f262ccf453e95c7934f2be418b660e61364b/python-client/giskard/scanner/report.py#L55

whoops!

mattbit · 2023-08-30T15:27:33Z

@rabah-khalek well spotted, I added the markdown file export and a couple of tests to ensure this is covered for both html and markdown.

…-decouple-from-templating

rabah-khalek · 2023-08-30T15:32:23Z

I think it looks great now! I have no other format requests.

I will however review the PR more carefully tomorrow and submit a review, thanks!.

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Performance	major	`Name` contains "mr"	Recall = 0.021	—	-96.85% than global	For records in your dataset where `Name` contains "mr", the recall is 96.8% lower than the global recall.
Performance	major	`Sex` == "male"	Recall = 0.111	—	-83.19% than global	For records in your dataset where `Sex` == "male", the recall is 96.8% lower than the global recall.
Performance	major	`Pclass` == 3	Precision = 0.475	—	-36.89% than global	For records in your dataset where `Pclass` == 3, the recall is 96.8% lower than the global recall.
Performance	medium	`Name` contains "master"	Accuracy = 0.708	—	-10.00% than global	For records in your dataset where `Name` contains "master", the recall is 96.8% lower than the global recall.
Performance	medium	`Parch` == 0	Recall = 0.600	—	-9.20% than global	For records in your dataset where `Parch` == 0, the recall is 96.8% lower than the global recall.
Performance	medium	`Parch` == 2	Precision = 0.692	—	-8.10% than global	For records in your dataset where `Parch` == 2, the recall is 96.8% lower than the global recall.
Performance	medium	`Embarked` == "S"	Recall = 0.611	—	-7.52% than global	For records in your dataset where `Embarked` == "S", the recall is 96.8% lower than the global recall.
Performance	medium	`Pclass` == 1	Accuracy = 0.733	—	-6.82% than global	For records in your dataset where `Pclass` == 1, the recall is 96.8% lower than the global recall.
Performance	medium	`Name` contains "miss"	Accuracy = 0.737	—	-6.31% than global	For records in your dataset where `Name` contains "miss", the recall is 96.8% lower than the global recall.
Performance	medium	`Embarked` == "Q"	Precision = 0.714	—	-5.18% than global	For records in your dataset where `Embarked` == "Q", the recall is 96.8% lower than the global recall.
Robustness	medium	—	Fail rate = 0.067	Transform to title case	6.67% of tested samples changed prediction after perturbation	When we perturb the content of feature “Name” with the transformation “Transform to title case” (see examples below), your model changes its prediction in about 6.67% of the cases. We expected the predictions not to be affected by this transformation.
Overconfidence	major	`Name` contains "mr"	Overconfidence rate = 0.620	—	+59.19% than global	For records in your dataset where `Name` contains "mr", we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 62.0% of the wrong predictions in the data slice).
Overconfidence	major	`text_length(Name)` < 23.500	Overconfidence rate = 0.588	—	+51.03% than global	For records in your dataset where `text_length(Name)` < 23.500, we found a significantly higher number of overconfident wrong predictions (20 samples, corresponding to 58.82352941176471% of the wrong predictions in the data slice).
Overconfidence	major	`Sex` == "male"	Overconfidence rate = 0.533	—	+36.94% than global	For records in your dataset where `Sex` == "male", we found a significantly higher number of overconfident wrong predictions (32 samples, corresponding to 53.333333333333336% of the wrong predictions in the data slice).
Overconfidence	major	`Parch` == 0	Overconfidence rate = 0.477	—	+22.45% than global	For records in your dataset where `Parch` == 0, we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 47.69230769230769% of the wrong predictions in the data slice).
Overconfidence	medium	`SibSp` == 0	Overconfidence rate = 0.443	—	+13.65% than global	For records in your dataset where `SibSp` == 0, we found a significantly higher number of overconfident wrong predictions (27 samples, corresponding to 44.26229508196721% of the wrong predictions in the data slice).
Spurious Correlation	minor	`Sex` == "female"	Nominal association (Theil's U) = 0.697	—	Prediction Survived = `yes` for 92.67% of samples in the slice	Data slice `Sex` == "female" seems to be highly associated to prediction Survived = `yes` (92.67% of predictions in the data slice).
Spurious Correlation	minor	`Sex` == "male"	Nominal association (Theil's U) = 0.697	—	Prediction Survived = `no` for 96.28% of samples in the slice	Data slice `Sex` == "male" seems to be highly associated to prediction Survived = `no` (96.28% of predictions in the data slice).
Spurious Correlation	minor	`Name` contains "mr"	Nominal association (Theil's U) = 0.609	—	Prediction Survived = `no` for 98.48% of samples in the slice	Data slice `Name` contains "mr" seems to be highly associated to prediction Survived = `no` (98.48% of predictions in the data slice).

rabah-khalek · 2023-08-30T15:34:06Z

I thought at first skipping a row per each vulnerability type is a nice touch, but I don't feel strongly about it. WDYT @mattbit ?

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Performance	major	`Name` contains "mr"	Recall = 0.021	—	-96.85% than global	For records in your dataset where `Name` contains "mr", the recall is 96.8% lower than the global recall.

Spurious Correlation	minor	`Name` contains "mr"	Nominal association (Theil's U) = 0.609	—	Prediction Survived = `no` for 98.48% of samples in the slice	Data slice `Name` contains "mr" seems to be highly associated to prediction Survived = `no` (98.48% of predictions in the data slice).

mattbit · 2023-08-30T15:56:31Z

I thought at first skipping a row per each vulnerability type is a nice touch, but I don't feel strongly about it. WDYT @mattbit ?

Looks good in your example but I think it would really depend on how the table is rendered… I would stick with no separator. If you want we can create separate tables per issue type.

rabah-khalek · 2023-08-30T16:08:22Z

I think it'll be nice to have a to_markdown(self, template="github") option where we output something like:

Performance vulnerabilities

Level	Data slice	Metric	Transformation	Deviation	Description
major	`Name` contains "mr"	Recall = 0.021	—	-96.85% than global	For records in your dataset where `Name` contains "mr", the recall is 96.8% lower than the global recall.

Spurious Correlation vulnerabilities

Level	Data slice	Metric	Transformation	Deviation	Description
minor	`Name` contains "mr"	Nominal association (Theil's U) = 0.609	—	Prediction Survived = `no` for 98.48% of samples in the slice	Data slice `Name` contains "mr" seems to be highly associated to prediction Survived = `no` (98.48% of predictions in the data slice).

while keeping the default a big table without separator like here: https://github.com//pull/1313#issuecomment-1699402654

@rabah-khalek

@rabah-khalek 👁️

mattbit · 2023-08-30T16:28:02Z

I think it'll be nice to have a to_markdown(self, template="github") option where we output something like:

@rabah-khalek Your wish is granted! 613fa92

…-decouple-from-templating

rabah-khalek · 2023-08-31T07:43:03Z

Look at that! it makes me want to wrap up the gh cicd action. Thanks @mattbit

Performance issues (10)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Performance	major	`Name` contains "mr"	Recall = 0.021	—	-96.85% than global	For records in your dataset where `Name` contains "mr", the recall is 96.8% lower than the global recall.

Robustness issues (1)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Robustness	medium	—	Fail rate = 0.067	Transform to title case	6.67% of tested samples changed prediction after perturbation	When we perturb the content of feature “Name” with the transformation “Transform to title case” (see examples below), your model changes its prediction in about 6.67% of the cases. We expected the predictions not to be affected by this transformation.

Overconfidence issues (5)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Overconfidence	major	`Name` contains "mr"	Overconfidence rate = 0.620	—	+59.19% than global	For records in your dataset where `Name` contains "mr", we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 62.0% of the wrong predictions in the data slice).

Spurious Correlation issues (3)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Spurious Correlation	minor	`Sex` == "female"	Nominal association (Theil's U) = 0.697	—	Prediction Survived = `yes` for 92.67% of samples in the slice	Data slice `Sex` == "female" seems to be highly associated to prediction Survived = `yes` (92.67% of predictions in the data slice).

rabah-khalek

overall LGTM, just a few small suggestions.

python-client/giskard/scanner/calibration/overconfidence_detector.py

python-client/giskard/scanner/calibration/underconfidence_detector.py

python-client/giskard/scanner/performance/performance_bias_detector.py

…-decouple-from-templating

andreybavt

generally looks good for a high level review, I left 1 comment

andreybavt · 2023-08-31T15:26:33Z

python-client/giskard/scanner/correlation/spurious_correlation_detector.py

-    "Cramer's V": "test_cramer_v",
-    "Mutual information": "test_mutual_information",
-    "Theil's U": "test_theil_u",
+    "cramer": "test_cramer_v",


in places like this can you reference a test function itself, not just it's name? It'll be difficult to remember to modify these strings in case we do automatic test method renaming

I think it was done like this to only import the tests lazily (this will only be called on generate_test_suite). I kept this in the current refactoring, tests are initialized lazily. (We wanted to minimize the big imports of testing as far as I understood.)

In principle these functions are part of our public API (testing), so I exclude the option of a light-hearted renaming.

…-decouple-from-templating

rabah-khalek · 2023-09-01T08:17:21Z

there're some minor code smells to be silenced

mattbit · 2023-09-01T08:25:17Z

there're some minor code smells to be silenced

right, i need to fix this

sonarqubecloud · 2023-09-01T11:02:46Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

90.8% Coverage
0.0% Duplication

henchaves added 23 commits August 11, 2023 12:02

Make data_leakage.html more similar with default.html

c23c705

Create default_new and delete data_leakage templates

8848d13

Integrate old performance template to default_new

96e0f27

Delete performance template

d6ae6eb

Integrate old robustness template to default_new

a6b5c53

Delete robustness template

a41afdd

Integrate old overconfidence template to default_new

68fee09

Delete overconfidence template

1723363

Fix overconfidence description

be264a6

Integrate spurious corr template to default_new

ae7e81c

Delete old spurious correlation template

7747bae

Integrate underconfidence to default_new

e550c81

Delete underconfidence template

3a27e32

Integrate stochasticity to default_new

b53294f

Delete stochasticity template

8cd886a

Integrate llm_toxicity to default_new

529a905

Delete llm_toxicity template

8c8b300

Integrate llm_harmfulness to default_new

075462c

Delete llm_harmfulness

b334ccd

Integrate llm_stereotype to default_new

a0eb42f

Delete llm_stereotype

e88c4f4

Delete old default template

cb8a28a

Rename default_new to default

56a50eb

henchaves self-assigned this Aug 17, 2023

henchaves added 5 commits August 18, 2023 00:46

Create basic markdown templates

29abc0d

Create methods to render markdown with iPython

148f4f3

Update _main_content.md

b7712d9

Update default.md

1cf459c

Control render type with class attribute

132935b

mattbit added 2 commits August 30, 2023 17:16

Remove unused templates

00f6eb8

Add metric value to markdown summary

3e57ea8

Add markdown file export and tests

b55aa32

Merge branch 'main' into feature/gsk-1528-refactor-issue-rendering-to…

aee36d2

…-decouple-from-templating

Add github markdown template

613fa92

@rabah-khalek 👁️

Merge branch 'main' into feature/gsk-1528-refactor-issue-rendering-to…

5d0c641

…-decouple-from-templating

rabah-khalek suggested changes Aug 31, 2023

View reviewed changes

Merge branch 'main' into feature/gsk-1528-refactor-issue-rendering-to…

502459b

…-decouple-from-templating

andreybavt reviewed Aug 31, 2023

View reviewed changes

mattbit added 2 commits August 31, 2023 19:01

Merge branch 'main' into feature/gsk-1528-refactor-issue-rendering-to…

8c210d0

…-decouple-from-templating

Merge branch 'main' into feature/gsk-1528-refactor-issue-rendering-to…

e6a40ef

…-decouple-from-templating

rabah-khalek approved these changes Sep 1, 2023

View reviewed changes

mattbit added 4 commits September 1, 2023 11:05

Avoid enclosing scope definition of vars for example extractor

99501cf

Fix test

9256f50

Update pytest config

21fbd66

Update lockfile

292f870

mattbit merged commit 5758c0d into main Sep 1, 2023

Hartorn deleted the feature/gsk-1528-refactor-issue-rendering-to-decouple-from-templating branch September 13, 2023 11:32

Uh oh!

[GSK-1528] Refactor issue rendering to decouple from templating #1313

[GSK-1528] Refactor issue rendering to decouple from templating #1313

Uh oh!

Conversation

henchaves commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Uh oh!

linear bot commented Aug 17, 2023

Uh oh!

rabah-khalek commented Aug 30, 2023

Uh oh!

mattbit commented Aug 30, 2023

Uh oh!

mattbit commented Aug 30, 2023

Uh oh!

rabah-khalek commented Aug 30, 2023

Uh oh!

rabah-khalek commented Aug 30, 2023

Uh oh!

mattbit commented Aug 30, 2023

Uh oh!

rabah-khalek commented Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattbit commented Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rabah-khalek commented Aug 31, 2023

Uh oh!

rabah-khalek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreybavt left a comment

Choose a reason for hiding this comment

Uh oh!

andreybavt Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

mattbit Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

rabah-khalek commented Sep 1, 2023

Uh oh!

mattbit commented Sep 1, 2023

Uh oh!

sonarqubecloud bot commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

henchaves commented Aug 17, 2023 •

edited

Loading

rabah-khalek commented Aug 30, 2023 •

edited

Loading

mattbit commented Aug 30, 2023 •

edited

Loading

sonarqubecloud bot commented Sep 1, 2023 •

edited

Loading