Skip to content

Conversation

@rabah-khalek
Copy link
Contributor

Titanic scan:
Screenshot 2023-08-07 at 14 06 14

Todo: write docstrings

@linear
Copy link

linear bot commented Aug 7, 2023

GSK-1316 Add tests export for spurious correlation

Need to specify the tests.

@rabah-khalek rabah-khalek self-assigned this Aug 7, 2023
@rabah-khalek rabah-khalek requested a review from mattbit August 7, 2023 12:06
@rabah-khalek rabah-khalek changed the title [GSK-1316] Spurious Correlation tests [GSK-1316, GSK-1415] Spurious Correlation tests Aug 7, 2023
@linear
Copy link

linear bot commented Aug 7, 2023

GSK-1415 Implement debug for Theil's U

Waiting on implementation

@rabah-khalek rabah-khalek added the Python Pull requests that update Python code label Aug 7, 2023
Copy link
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Decide if it’s better to have a single test_nominal_association with different methods (could be easier to understand for the user) or specific tests for each metric like test_theil_u (this would be for power users I would say) or both.
  • Small refactoring of code to use a common _test_association with the different methods (internal)
  • Add tests!

Comment on lines 413 to 440
import pandas as pd

sliced_dataset = dataset.slice(slicing_function)
check_slice_not_empty(sliced_dataset=sliced_dataset, dataset_name="dataset", test_name="test_theil_u")

dx = pd.DataFrame(
{
"slice": dataset.df.index.isin(sliced_dataset.df.index).astype(int),
"prediction": model.predict(dataset).prediction,
},
index=dataset.df.index,
)
dx.dropna(inplace=True)

metric = _theil_u(dx.slice, dx.prediction)
passed = metric < threshold

# --- debug ---
output_ds = None
if not passed and debug:
output_ds = sliced_dataset.copy() # copy all properties
test_name = inspect.stack()[0][3]
output_ds.name = debug_prefix + test_name
# ---

messages = [TestMessage(type=TestMessageLevel.INFO, text=f"metric = {metric}, threshold = {threshold}")]

return TestResult(metric=metric, passed=passed, messages=messages, output_df=output_ds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference with the previous test is that we call _teil_u instead of _mutual_information. This could be refactored in a single test_nominal_association with a method attribute for example (e.g. theil or mutual_info) to avoid code repetition.

@rabah-khalek rabah-khalek requested a review from mattbit August 7, 2023 15:22
Copy link
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: the docstrings do not respect the max line length limit, could you fix that?

@rabah-khalek rabah-khalek merged commit 960dd02 into main Aug 8, 2023
@sonarqubecloud
Copy link

sonarqubecloud bot commented Aug 8, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

85.3% 85.3% Coverage
0.0% 0.0% Duplication

@Hartorn Hartorn deleted the GSK-1316-spurious-corr branch September 22, 2023 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Python Pull requests that update Python code

Development

Successfully merging this pull request may close these issues.

3 participants