-
-
Notifications
You must be signed in to change notification settings - Fork 379
GSK-1533 #1307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSK-1533 #1307
Conversation
… logging to the WandB run.
… distinct panels.
GSK-1533 SHAP logging in wandb
let's create a function that takes in a giskard model and dataset runs SHAP on all data samples, and to start with, outputs the shapely values of each feature per sample as an array/dataframe. let's start with numerical and categorical values. we'll extend it for text later on. we can then find a way to present these outputs as interactive graphs in wandb. These already exist in |
…apsulate the logic of uploading SHAP charts to the WandB.
AbSsEnT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…anation results equality. Added unit-tests for the SHAP logging to the WandB.
…d self-explainable.
…unction output to highlight, that this data will be explained.
…r better handling classification models explanation.
| from typing import Any, Iterable | ||
| from dataclasses import dataclass | ||
|
|
||
| import wandb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import wandb shouldn't be here, as we said, it would break. Please embed it only where we need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that I am going to use "shap_results.py" module just in case, when I need to import ShapResults class, which requires wandb. So I do not need anything else from this module, which do not use wandb. i.e. I am gonna use this module only if wandb is installed. Otherwise, I need to put wandb import to all private functions and ShapResults.to_wandb() which looks like an overhead. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's because we decided to make explain_with_shap a global function that returns ShapResult, so that users can use it without necessarily wanting to log it with wandb (maybe later we want to log it to mlflow or other mlops).
several imports of the same libraries is not an issue in python btw, see: https://stackoverflow.com/questions/37067414/python-import-multiple-times.
If you don't like the idea of import wandb in every plotting function, you can refactor them into wandb_utils.py, I think that's a cleaner solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if I put plotting functions into wandb_utils.py, I will also get an error, but defined by us. Do you expect such behaviour and we are ok with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ImportError("The 'wandb' python package is not installed. To get it, run 'pip install wandb'.") from e
…tils.py to better handle wandb importing necessity.
|
Kudos, SonarCloud Quality Gate passed! |
* added wandb run contextmanager * added to_wandb for scan results * Added new method to the Dataset class to log dataset to the WandB run. (#1294) * Added new method to the Dataset class to log dataset to the WandB run. * updated to_wandb --------- Co-authored-by: Rabah Abdul Khalek <[email protected]> * setting up the doc skeleton * updated pyproject and pdm lock with wandb * working on tests * GSK-1531 (#1301) * Added new method to the TestSuiteResult class to log its execution results to the WandB run. * Resolved issues. * refactoring _parse_test_name --------- Co-authored-by: Rabah Abdul Khalek <[email protected]> * functional tests implemented (GSK-1535) * fixed code smell * updated docs * updated imports * GSK-1533 (#1307) * Initial commit with the implementation of the SHAP explanation graphs logging to the WandB run. * Changed logic of obtaining feature names and types. * Removed redundant 'model.prepare_dataframe'. Small refactoring. * Added sorting of logged dataset, test suite result and scan result to distinct panels. * Moved 'explain' function below shap-related functions. * Code refactoring. * Changed naming for variables inside functions. * Removed explainer return, as it is not needed. * Moved 'prepare_df' to the separate utils.py file to avoid code duplication. * Added docstring to the '_get_cls_prediction_explanation' * Created dataclass ShapResult to store shap explanations there and encapsulate the logic of uploading SHAP charts to the WandB. * Refactoring of the 'background_example' function. * Refactoring. * Refactoring. * Changed enum class declaration. * Refactored model_explanation.py to be able to perform testing of explanation results equality. Added unit-tests for the SHAP logging to the WandB. * Small fix in comments. * Uncommented fixture. * Refactored "_get_highest_prob_shap" function. Made it more compact and self-explainable. * Removed #noqa options from the shap imports. Optimized imports. * Refactored _prepare_for_explanation function. Changed naming of the function output to highlight, that this data will be explained. * Renamed explain_full(one) to "_calculate_dataset(sample)_shap_values" * Refactored _get_background_example function. * Refactored 'explain_with_shap' function and 'ShapResult' dataclass for better handling classification models explanation. * Fixed bugs with unit-tests for wandb. * Transferred '_compare_explain_functions' to the 'test_model_explanation.py' * Refactoring. Renaming and functions replacement. * Renaming. * Transferred plotting functions from the shap_result.py to the wandb_utils.py to better handle wandb importing necessity. * small update to error msg * updated unit test * small update --------- Co-authored-by: Rabah Abdul Khalek <[email protected]> * added errors and telemetry * fixing code smells * fixed indent * turned off validation of Dataset in model_explanation * exposed explain_with_shap * converted error to warning * updated tests * restored fixtures * New example notebook to show WandB integration functionality. * WandB notebook refactoring. Committing images. * Removed blank cell. * Removed blank cell. * updated docs * Replaced screenshots with the giskard scan result. * Fixed 'explain_with_shap' issue, when the model is the LGBM. * Updated screenshot with test-suite results comparison for multiple runs. * updated pdm lock * implementing AA's feedback * GSK-1565 (#1339) * Added docstrings to the "model_explanation.py". * Added docstrings to the "shap_result.py". * Fix in docstrings * Added docstring to the 'Dataset.to_wandb'. * Added docstring to the 'ScanResult.to_wandb'. * Added docstring to the 'TestSuiteResult.to_wandb'. * Resolved issues after PR review. * updated docstrings --------- Co-authored-by: Rabah Abdul Khalek <[email protected]> --------- Co-authored-by: AbSsEnT <[email protected]> Co-authored-by: Andrey Avtomonov <[email protected]>








Added new functionality of SHAP Information logging in the scope of WandB integration.