Skip to content

Conversation

@mattbit
Copy link
Member

@mattbit mattbit commented Jun 13, 2023

Fixes #1159.

@linear
Copy link

linear bot commented Jun 13, 2023

GSK-1275 Importance of metrics calculated on partial data slice

User KD_A on reddit pointed out that

And does the # samples refer to the # samples in the data slice? It should be the denominator of the metric for the slice. For example, if that second row's recall of 0.111 is 1 predicted positive / 9 true positive, it's debatable whether to flag that.

This is right. We may have 1000 samples in our data slice, but to calculate for example the recall we only use the positive samples, which may be just a few samples out of the total, making the detection a false positive.

@mattbit mattbit force-pushed the task/fix-scan-metrics branch from 240413c to a5bf2d4 Compare June 19, 2023 09:51
@mattbit mattbit changed the base branch from main to task/GSK-1078 June 20, 2023 09:38
mattbit added 3 commits June 20, 2023 15:25
…GSK-1279]

- [GSK-1275] Fixes problems with metrics calculated on small samples
- [GSK-1279] Experimental support for false discovery rate control via Benjamini–Hochberg procedure
@mattbit mattbit force-pushed the task/fix-scan-metrics branch from 874fd77 to 11a84a5 Compare June 20, 2023 14:30
@mattbit mattbit marked this pull request as ready for review June 20, 2023 14:31
@mattbit mattbit requested a review from andreybavt June 21, 2023 10:19
Copy link
Contributor

@andreybavt andreybavt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also LGTM with some questions

def _calculate_affected_samples(self, y_true: np.ndarray, y_pred: np.ndarray, model: BaseModel) -> int:
if model.is_binary_classification:
# F1 score will not be affected by true negatives
neg = model.meta.classification_labels[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only do it for binary classification and not for all cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the way the F1 is computed for multiclass is different. In our case it will use the total count of true positives, false positives, and false negatives in a one-vs-rest way for each class. So in the end it will use all the samples.

@mattbit mattbit requested a review from andreybavt June 21, 2023 14:33
@andreybavt andreybavt merged commit f1f9465 into task/GSK-1078 Jun 21, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

[GSK-1275] Importance of metrics calculated on partial data slice

3 participants