Skip to content

Conversation

@AbSsEnT
Copy link
Contributor

@AbSsEnT AbSsEnT commented Aug 9, 2023

Added new functionality of SHAP Information logging in the scope of WandB integration.

@AbSsEnT AbSsEnT added enhancement New feature or request feature labels Aug 9, 2023
@AbSsEnT AbSsEnT requested a review from rabah-khalek August 9, 2023 11:59
@AbSsEnT AbSsEnT self-assigned this Aug 9, 2023
@linear
Copy link

linear bot commented Aug 9, 2023

GSK-1533 SHAP logging in wandb

let's create a function that takes in a giskard model and dataset runs SHAP on all data samples, and to start with, outputs the shapely values of each feature per sample as an array/dataframe.

let's start with numerical and categorical values.

we'll extend it for text later on.

we can then find a way to present these outputs as interactive graphs in wandb.

These already exist in model_explanation.py, check what needs to be added in order to get the most for wandb.

@AbSsEnT AbSsEnT requested a review from rabah-khalek August 10, 2023 14:23
Copy link
Contributor Author

@AbSsEnT AbSsEnT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from typing import Any, Iterable
from dataclasses import dataclass

import wandb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import wandb shouldn't be here, as we said, it would break. Please embed it only where we need it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that I am going to use "shap_results.py" module just in case, when I need to import ShapResults class, which requires wandb. So I do not need anything else from this module, which do not use wandb. i.e. I am gonna use this module only if wandb is installed. Otherwise, I need to put wandb import to all private functions and ShapResults.to_wandb() which looks like an overhead. WDYT?

Copy link
Contributor

@rabah-khalek rabah-khalek Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because we decided to make explain_with_shap a global function that returns ShapResult, so that users can use it without necessarily wanting to log it with wandb (maybe later we want to log it to mlflow or other mlops).

several imports of the same libraries is not an issue in python btw, see: https://stackoverflow.com/questions/37067414/python-import-multiple-times.

If you don't like the idea of import wandb in every plotting function, you can refactor them into wandb_utils.py, I think that's a cleaner solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if I put plotting functions into wandb_utils.py, I will also get an error, but defined by us. Do you expect such behaviour and we are ok with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise ImportError("The 'wandb' python package is not installed. To get it, run 'pip install wandb'.") from e

@AbSsEnT AbSsEnT requested a review from rabah-khalek August 16, 2023 16:39
@rabah-khalek rabah-khalek merged commit 0f2b634 into GSK-1505-wandb Aug 16, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

99.2% 99.2% Coverage
0.0% 0.0% Duplication

andreybavt added a commit that referenced this pull request Aug 30, 2023
* added wandb run contextmanager

* added to_wandb for scan results

* Added new method to the Dataset class to log dataset to the WandB run. (#1294)

* Added new method to the Dataset class to log dataset to the WandB run.

* updated to_wandb

---------

Co-authored-by: Rabah Abdul Khalek <[email protected]>

* setting up the doc skeleton

* updated pyproject and pdm lock with wandb

* working on tests

* GSK-1531 (#1301)

* Added new method to the TestSuiteResult class to log its execution results to the WandB run.

* Resolved issues.

* refactoring _parse_test_name

---------

Co-authored-by: Rabah Abdul Khalek <[email protected]>

* functional tests implemented (GSK-1535)

* fixed code smell

* updated docs

* updated imports

* GSK-1533 (#1307)

* Initial commit with the implementation of the SHAP explanation graphs logging to the WandB run.

* Changed logic of obtaining feature names and types.

* Removed redundant 'model.prepare_dataframe'. Small refactoring.

* Added sorting of logged dataset, test suite result and scan result to distinct panels.

* Moved 'explain' function below shap-related functions.

* Code refactoring.

* Changed naming for variables inside functions.

* Removed explainer return, as it is not needed.

* Moved 'prepare_df' to the separate utils.py file to avoid code duplication.

* Added docstring to the '_get_cls_prediction_explanation'

* Created dataclass ShapResult to store shap explanations there and encapsulate the logic of uploading SHAP charts to the WandB.

* Refactoring of the 'background_example' function.

* Refactoring.

* Refactoring.

* Changed enum class declaration.

* Refactored model_explanation.py to be able to perform testing of explanation results equality. Added unit-tests for the SHAP logging to the WandB.

* Small fix in comments.

* Uncommented fixture.

* Refactored "_get_highest_prob_shap" function. Made it more compact and self-explainable.

* Removed #noqa options from the shap imports. Optimized imports.

* Refactored _prepare_for_explanation function. Changed naming of the function output to highlight, that this data will be explained.

* Renamed explain_full(one) to "_calculate_dataset(sample)_shap_values"

* Refactored _get_background_example function.

* Refactored 'explain_with_shap' function and 'ShapResult' dataclass for better handling classification models explanation.

* Fixed bugs with unit-tests for wandb.

* Transferred '_compare_explain_functions' to the 'test_model_explanation.py'

* Refactoring. Renaming and functions replacement.

* Renaming.

* Transferred plotting functions from the shap_result.py to the wandb_utils.py to better handle wandb importing necessity.

* small update to error msg

* updated unit test

* small update

---------

Co-authored-by: Rabah Abdul Khalek <[email protected]>

* added errors and telemetry

* fixing code smells

* fixed indent

* turned off validation of Dataset in model_explanation

* exposed explain_with_shap

* converted error to warning

* updated tests

* restored fixtures

* New example notebook to show WandB integration functionality.

* WandB notebook refactoring. Committing images.

* Removed blank cell.

* Removed blank cell.

* updated docs

* Replaced screenshots with the giskard scan result.

* Fixed 'explain_with_shap' issue, when the model is the LGBM.

* Updated screenshot with test-suite results comparison for multiple runs.

* updated pdm lock

* implementing AA's feedback

* GSK-1565 (#1339)

* Added docstrings to the "model_explanation.py".

* Added docstrings to the "shap_result.py".

* Fix in docstrings

* Added docstring to the 'Dataset.to_wandb'.

* Added docstring to the 'ScanResult.to_wandb'.

* Added docstring to the 'TestSuiteResult.to_wandb'.

* Resolved issues after PR review.

* updated docstrings

---------

Co-authored-by: Rabah Abdul Khalek <[email protected]>

---------

Co-authored-by: AbSsEnT <[email protected]>
Co-authored-by: Andrey Avtomonov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature

Development

Successfully merging this pull request may close these issues.

3 participants