Skip to content

Conversation

@henchaves
Copy link
Member

@henchaves henchaves commented Aug 17, 2023

Description

The purpose of this PR is to simplify the classes that inherit from the Issue class used for displaying scan results. We aim to make these classes more minimalistic by reducing the number of properties and adding a summary property for better visualization and serialization.

Additionally, this PR combines all templates linked to a single type of issue into a single one called default.html.

Lastly, we have created basic templates with markdown to enable seamless integration of Giskard Scan results with other applications.

Related Issue

GSK-1528 (available on Linear)

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

@henchaves henchaves self-assigned this Aug 17, 2023
@linear
Copy link

linear bot commented Aug 17, 2023

GSK-1528 Refactor issue rendering to decouple from templating

Refactor the issue rendering, moving from the jinja templates to the issue objects. Each issue should provide standardized attributes that we can use to create the issue info that will be used for both HTML templates and markdown, csv export.

@rabah-khalek
Copy link
Contributor

Fine about strings.

You can do report.to_markdown("my_report.md") and open in a markdown editor to check.

dumping into file doesn't seem to be supported @mattbit: https://github.com/Giskard-AI/giskard/blob/c5e0f262ccf453e95c7934f2be418b660e61364b/python-client/giskard/scanner/report.py#L55

@mattbit
Copy link
Member

mattbit commented Aug 30, 2023

Fine about strings.

You can do report.to_markdown("my_report.md") and open in a markdown editor to check.

dumping into file doesn't seem to be supported @mattbit:

https://github.com/Giskard-AI/giskard/blob/c5e0f262ccf453e95c7934f2be418b660e61364b/python-client/giskard/scanner/report.py#L55

whoops!

@mattbit
Copy link
Member

mattbit commented Aug 30, 2023

@rabah-khalek well spotted, I added the markdown file export and a couple of tests to ensure this is covered for both html and markdown.

@rabah-khalek
Copy link
Contributor

I think it looks great now! I have no other format requests.

I will however review the PR more carefully tomorrow and submit a review, thanks!.

Vulnerability Level Data slice Metric Transformation Deviation Description
Performance major Name contains "mr" Recall = 0.021 -96.85% than global For records in your dataset where Name contains "mr", the recall is 96.8% lower than the global recall.
Performance major Sex == "male" Recall = 0.111 -83.19% than global For records in your dataset where Sex == "male", the recall is 96.8% lower than the global recall.
Performance major Pclass == 3 Precision = 0.475 -36.89% than global For records in your dataset where Pclass == 3, the recall is 96.8% lower than the global recall.
Performance medium Name contains "master" Accuracy = 0.708 -10.00% than global For records in your dataset where Name contains "master", the recall is 96.8% lower than the global recall.
Performance medium Parch == 0 Recall = 0.600 -9.20% than global For records in your dataset where Parch == 0, the recall is 96.8% lower than the global recall.
Performance medium Parch == 2 Precision = 0.692 -8.10% than global For records in your dataset where Parch == 2, the recall is 96.8% lower than the global recall.
Performance medium Embarked == "S" Recall = 0.611 -7.52% than global For records in your dataset where Embarked == "S", the recall is 96.8% lower than the global recall.
Performance medium Pclass == 1 Accuracy = 0.733 -6.82% than global For records in your dataset where Pclass == 1, the recall is 96.8% lower than the global recall.
Performance medium Name contains "miss" Accuracy = 0.737 -6.31% than global For records in your dataset where Name contains "miss", the recall is 96.8% lower than the global recall.
Performance medium Embarked == "Q" Precision = 0.714 -5.18% than global For records in your dataset where Embarked == "Q", the recall is 96.8% lower than the global recall.
Robustness medium Fail rate = 0.067 Transform to title case 6.67% of tested samples changed prediction after perturbation When we perturb the content of feature “Name” with the transformation “Transform to title case” (see examples below), your model changes its prediction in about 6.67% of the cases. We expected the predictions not to be affected by this transformation.
Overconfidence major Name contains "mr" Overconfidence rate = 0.620 +59.19% than global For records in your dataset where Name contains "mr", we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 62.0% of the wrong predictions in the data slice).
Overconfidence major text_length(Name) < 23.500 Overconfidence rate = 0.588 +51.03% than global For records in your dataset where text_length(Name) < 23.500, we found a significantly higher number of overconfident wrong predictions (20 samples, corresponding to 58.82352941176471% of the wrong predictions in the data slice).
Overconfidence major Sex == "male" Overconfidence rate = 0.533 +36.94% than global For records in your dataset where Sex == "male", we found a significantly higher number of overconfident wrong predictions (32 samples, corresponding to 53.333333333333336% of the wrong predictions in the data slice).
Overconfidence major Parch == 0 Overconfidence rate = 0.477 +22.45% than global For records in your dataset where Parch == 0, we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 47.69230769230769% of the wrong predictions in the data slice).
Overconfidence medium SibSp == 0 Overconfidence rate = 0.443 +13.65% than global For records in your dataset where SibSp == 0, we found a significantly higher number of overconfident wrong predictions (27 samples, corresponding to 44.26229508196721% of the wrong predictions in the data slice).
Spurious Correlation minor Sex == "female" Nominal association (Theil's U) = 0.697 Prediction Survived = yes for 92.67% of samples in the slice Data slice Sex == "female" seems to be highly associated to prediction Survived = yes (92.67% of predictions in the data slice).
Spurious Correlation minor Sex == "male" Nominal association (Theil's U) = 0.697 Prediction Survived = no for 96.28% of samples in the slice Data slice Sex == "male" seems to be highly associated to prediction Survived = no (96.28% of predictions in the data slice).
Spurious Correlation minor Name contains "mr" Nominal association (Theil's U) = 0.609 Prediction Survived = no for 98.48% of samples in the slice Data slice Name contains "mr" seems to be highly associated to prediction Survived = no (98.48% of predictions in the data slice).

@rabah-khalek
Copy link
Contributor

I thought at first skipping a row per each vulnerability type is a nice touch, but I don't feel strongly about it. WDYT @mattbit ?

Vulnerability Level Data slice Metric Transformation Deviation Description
Performance major Name contains "mr" Recall = 0.021 -96.85% than global For records in your dataset where Name contains "mr", the recall is 96.8% lower than the global recall.
Spurious Correlation minor Name contains "mr" Nominal association (Theil's U) = 0.609 Prediction Survived = no for 98.48% of samples in the slice Data slice Name contains "mr" seems to be highly associated to prediction Survived = no (98.48% of predictions in the data slice).

@mattbit
Copy link
Member

mattbit commented Aug 30, 2023

I thought at first skipping a row per each vulnerability type is a nice touch, but I don't feel strongly about it. WDYT @mattbit ?

Looks good in your example but I think it would really depend on how the table is rendered… I would stick with no separator. If you want we can create separate tables per issue type.

@rabah-khalek
Copy link
Contributor

rabah-khalek commented Aug 30, 2023

I think it'll be nice to have a to_markdown(self, template="github") option where we output something like:

Performance vulnerabilities
Level Data slice Metric Transformation Deviation Description
major Name contains "mr" Recall = 0.021 -96.85% than global For records in your dataset where Name contains "mr", the recall is 96.8% lower than the global recall.
Spurious Correlation vulnerabilities
Level Data slice Metric Transformation Deviation Description
minor Name contains "mr" Nominal association (Theil's U) = 0.609 Prediction Survived = no for 98.48% of samples in the slice Data slice Name contains "mr" seems to be highly associated to prediction Survived = no (98.48% of predictions in the data slice).
while keeping the default a big table without separator like here: https://github.com//pull/1313#issuecomment-1699402654

@mattbit
Copy link
Member

mattbit commented Aug 30, 2023

I think it'll be nice to have a to_markdown(self, template="github") option where we output something like:

@rabah-khalek Your wish is granted! 613fa92

@rabah-khalek
Copy link
Contributor

Look at that! it makes me want to wrap up the gh cicd action. Thanks @mattbit

Performance issues (10)
Vulnerability Level Data slice Metric Transformation Deviation Description
Performance major Name contains "mr" Recall = 0.021 -96.85% than global For records in your dataset where Name contains "mr", the recall is 96.8% lower than the global recall.
Robustness issues (1)
Vulnerability Level Data slice Metric Transformation Deviation Description
Robustness medium Fail rate = 0.067 Transform to title case 6.67% of tested samples changed prediction after perturbation When we perturb the content of feature “Name” with the transformation “Transform to title case” (see examples below), your model changes its prediction in about 6.67% of the cases. We expected the predictions not to be affected by this transformation.
Overconfidence issues (5)
Vulnerability Level Data slice Metric Transformation Deviation Description
Overconfidence major Name contains "mr" Overconfidence rate = 0.620 +59.19% than global For records in your dataset where Name contains "mr", we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 62.0% of the wrong predictions in the data slice).
Spurious Correlation issues (3)
Vulnerability Level Data slice Metric Transformation Deviation Description
Spurious Correlation minor Sex == "female" Nominal association (Theil's U) = 0.697 Prediction Survived = yes for 92.67% of samples in the slice Data slice Sex == "female" seems to be highly associated to prediction Survived = yes (92.67% of predictions in the data slice).

Copy link
Contributor

@rabah-khalek rabah-khalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, just a few small suggestions.

Copy link
Contributor

@andreybavt andreybavt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good for a high level review, I left 1 comment

"Cramer's V": "test_cramer_v",
"Mutual information": "test_mutual_information",
"Theil's U": "test_theil_u",
"cramer": "test_cramer_v",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in places like this can you reference a test function itself, not just it's name? It'll be difficult to remember to modify these strings in case we do automatic test method renaming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was done like this to only import the tests lazily (this will only be called on generate_test_suite). I kept this in the current refactoring, tests are initialized lazily. (We wanted to minimize the big imports of testing as far as I understood.)

In principle these functions are part of our public API (testing), so I exclude the option of a light-hearted renaming.

@rabah-khalek
Copy link
Contributor

there're some minor code smells to be silenced

@mattbit
Copy link
Member

mattbit commented Sep 1, 2023

there're some minor code smells to be silenced

right, i need to fix this

@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 1, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

90.8% 90.8% Coverage
0.0% 0.0% Duplication

@mattbit mattbit merged commit 5758c0d into main Sep 1, 2023
@Hartorn Hartorn deleted the feature/gsk-1528-refactor-issue-rendering-to-decouple-from-templating branch September 13, 2023 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scan Created by Linear-GitHub Sync

Development

Successfully merging this pull request may close these issues.

5 participants