-
-
Notifications
You must be signed in to change notification settings - Fork 378
[GSK-1316, GSK-1415] Spurious Correlation tests #1302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Decide if it’s better to have a single
test_nominal_associationwith different methods (could be easier to understand for the user) or specific tests for each metric liketest_theil_u(this would be for power users I would say) or both. - Small refactoring of code to use a common
_test_associationwith the different methods (internal) - Add tests!
| import pandas as pd | ||
|
|
||
| sliced_dataset = dataset.slice(slicing_function) | ||
| check_slice_not_empty(sliced_dataset=sliced_dataset, dataset_name="dataset", test_name="test_theil_u") | ||
|
|
||
| dx = pd.DataFrame( | ||
| { | ||
| "slice": dataset.df.index.isin(sliced_dataset.df.index).astype(int), | ||
| "prediction": model.predict(dataset).prediction, | ||
| }, | ||
| index=dataset.df.index, | ||
| ) | ||
| dx.dropna(inplace=True) | ||
|
|
||
| metric = _theil_u(dx.slice, dx.prediction) | ||
| passed = metric < threshold | ||
|
|
||
| # --- debug --- | ||
| output_ds = None | ||
| if not passed and debug: | ||
| output_ds = sliced_dataset.copy() # copy all properties | ||
| test_name = inspect.stack()[0][3] | ||
| output_ds.name = debug_prefix + test_name | ||
| # --- | ||
|
|
||
| messages = [TestMessage(type=TestMessageLevel.INFO, text=f"metric = {metric}, threshold = {threshold}")] | ||
|
|
||
| return TestResult(metric=metric, passed=passed, messages=messages, output_df=output_ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference with the previous test is that we call _teil_u instead of _mutual_information. This could be refactored in a single test_nominal_association with a method attribute for example (e.g. theil or mutual_info) to avoid code repetition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment: the docstrings do not respect the max line length limit, could you fix that?
|
Kudos, SonarCloud Quality Gate passed! |








Titanic scan:

Todo: write docstrings