Skip to content

Conversation

@rabah-khalek
Copy link
Contributor

Changing the perturbation of numerical features, from +/- 2*MAD, to 10%*value.

@linear
Copy link

linear bot commented Sep 4, 2023

GSK-1634 Decrease the threshold for the "makes prediction change" push feature

In this example for me there is nothing strange, that if we decrease age by 20, the prediction will change. It seems to me, that we need to decrease change size, which triggers this push-feature. 

image.png

@rabah-khalek rabah-khalek self-assigned this Sep 4, 2023
@rabah-khalek rabah-khalek added Python Pull requests that update Python code push-feature labels Sep 4, 2023
# Compute the MAD of the column
mad = compute_mad(ds.df[feature]) # Small issue: distribution might not be normal
# Compute 10% around the value to be perturbed
value_to_perturb = ds_slice.df[feature].iloc[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we were calculating MAD based on the whole series, now we're taking just the first element, isn't it less robust in case the first element is an outlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by first element, it's the sample we get in the debugger. Yes, @jmsquare suggested based on @AbSsEnT 's feedback, that we reduce the perturbation to a fixed rate (in this case 10% or less of the sample-value) instead of using MAD because it's less clear where it comes from, plus the perturbation was really sizeable (see the linear card for examples).

@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 5, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@andreybavt andreybavt merged commit 95dcdda into main Sep 6, 2023
@Hartorn Hartorn deleted the GSK-1634-from-mad-to-10-percent-perturbation-max branch September 22, 2023 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

push-feature Python Pull requests that update Python code

Development

Successfully merging this pull request may close these issues.

3 participants