[GSK-1279] Fisher's exact test and permutation test for slice metrics significance testing #1671

luca-rossi · 2023-12-06T18:44:49Z

Added Fisher's exact test and permutation test for slice metrics significance testing. Todo: unit tests.

… significance testing

linear · 2023-12-06T18:44:53Z

GSK-1279 More rigorous evaluation of significance of performance metrics

mattbit

Few minor comments but otherwise looks good to me.

mattbit · 2023-12-08T13:01:11Z

giskard/scanner/performance/performance_bias_detector.py

+        perm_slice_dataset = Dataset(
+            dataset.df.loc[slice_ids],
+            target=dataset.target,
+            column_types=dataset.column_types.copy(),


I don't think we need to make a copy here

mattbit · 2023-12-08T13:01:31Z

giskard/scanner/performance/performance_bias_detector.py

+def _calculate_pvalue_from_permutation_test(
+    slice_dataset, comp_dataset, dataset, model, metric, perm_test_resamples=1000
+):
+    logger.info("PerformanceBiasDetector: permutation test")


let’s keep this as debug

mattbit · 2023-12-08T13:01:41Z

giskard/scanner/performance/performance_bias_detector.py

+
+    # if the slice size is too small, use Fisher's exact test, otherwise use a chi-square test
+    if min(min(row) for row in ctable) <= max_size_fisher:
+        logger.info("PerformanceBiasDetector: Fisher's exact test")


logger.debug

mattbit · 2023-12-08T13:02:08Z

giskard/scanner/performance/performance_bias_detector.py

+
+    ctable = [[slice_x_cnt, slice_y_cnt], [comp_x_cnt, comp_y_cnt]]
+
+    # if the slice size is too small, use Fisher's exact test, otherwise use a chi-square test


It‘s a G-test

mattbit · 2023-12-08T13:08:24Z

giskard/scanner/performance/performance_bias_detector.py

            _, pvalue = scipy.stats.ttest_ind(
                slice_metric.raw_values, comp_metric.raw_values, equal_var=False, alternative=alternative
            )
+        elif metric.name.lower() in ["accuracy", "precision", "recall"]:


We should do this check differently. Let’s use some attribute on the metric object, like is_binary_metric or similar.

maybe it's even better to let the Metric class the responsibility of calculating the contingency table entries

mattbit · 2023-12-08T13:09:55Z

giskard/scanner/performance/performance_bias_detector.py

+            )
+    except ValueError as err:
        pvalue = np.nan
+        logger.info(f"PerformanceBiasDetector: p-value could not be calculated: {err}")


logger.debug

…Giskard-AI/giskard into task/GSK-1279-statistical-tests

mattbit

Sorry, some small changes

giskard/scanner/performance/metrics.py

mattbit · 2023-12-12T16:39:36Z

giskard/scanner/performance/metrics.py

    value: float
    affected_samples: int
    raw_values: Optional[np.ndarray] = None
+    ctable_values: Optional[list[list[int]]] = None


We can't have a contingency table

We can be more general and maybe just get a categorical or binary representation?
Something like binary_counts or binary_representation?

Same as above

mattbit · 2023-12-12T16:49:11Z

giskard/scanner/performance/metrics.py

 class Accuracy(ClassificationPerformanceMetric):
    name = "Accuracy"
    greater_is_better = True
+    has_contingency_table = True


Same as above, this could be more general. In any case, I’m not sure it's needed if we have it on the result object.

It's not needed but I added it for clarity and "just in case". Should I remove it?

Actually, it's needed because it's used to decide whether to calculate the binary counts (to prevent from calculating them for classification metrics where it wouldn't make sense, e.g. the F1-Score)

mattbit · 2023-12-12T16:49:59Z

giskard/scanner/performance/performance_bias_detector.py

+            # column_types=dataset.column_types.copy(),
+            # validation=False,


why is this commented?

Forgot to remove it

mattbit · 2023-12-12T16:50:04Z

giskard/scanner/performance/performance_bias_detector.py

+            # column_types=dataset.column_types.copy(),
+            # validation=False,


mattbit · 2023-12-12T16:50:52Z

tests/scan/test_performance_bias_detector.py


 def test_calculate_slice_metrics():
+    SLICE_SIZE = 500
+    np.random.seed(42)


should have the seed somewhere in the detector, not setting the global seed here

Yes, not sure why I put it there

mattbit

Looks good, I’ll do some local testing.

mattbit · 2023-12-14T10:19:28Z

giskard/scanner/performance/metrics.py

 class BalancedAccuracy(ClassificationPerformanceMetric):
    name = "Balanced Accuracy"
    greater_is_better = True
+    has_binary_counts = False


mattbit · 2023-12-14T10:19:40Z

giskard/scanner/performance/metrics.py

 class F1Score(SklearnClassificationScoreMixin, ClassificationPerformanceMetric):
    name = "F1 Score"
    greater_is_better = True
+    has_binary_counts = False


mattbit · 2023-12-14T10:19:50Z

giskard/scanner/performance/metrics.py

 class AUC(PerformanceMetric):
    name = "ROC AUC"
    greater_is_better = True
+    has_binary_counts = False


…None)

sonarqubecloud · 2023-12-14T15:37:23Z

Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
95.7% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

[GSK-1279] Fisher's exact test and permutation test for slice metrics…

6ea0ff8

… significance testing

luca-rossi requested a review from mattbit December 6, 2023 18:45

luca-rossi added 2 commits December 6, 2023 20:58

[GSK-1279] Refactor: isolate p-value methods for easier testing

238609f

[GSK-1279] Test statistical significance testing

cf0e61f

luca-rossi marked this pull request as ready for review December 8, 2023 10:35

Merge branch 'main' into task/GSK-1279-statistical-tests

2240dd4

mattbit reviewed Dec 8, 2023

View reviewed changes

mattbit and others added 3 commits December 8, 2023 14:21

Merge branch 'main' into task/GSK-1279-statistical-tests

e68aeec

[GSK-1279] Significance testing refactor and small fixes

895c693

Merge branch 'task/GSK-1279-statistical-tests' of https://github.com/…

69c1159

…Giskard-AI/giskard into task/GSK-1279-statistical-tests

luca-rossi requested a review from mattbit December 8, 2023 18:01

Merge branch 'main' into task/GSK-1279-statistical-tests

c6a7c81

mattbit requested changes Dec 12, 2023

View reviewed changes

[GSK-1279] Minor fixes

e3c7c9b

luca-rossi requested a review from mattbit December 13, 2023 19:53

mattbit approved these changes Dec 14, 2023

View reviewed changes

mattbit added 3 commits December 14, 2023 15:42

Merge branch 'main' into task/GSK-1279-statistical-tests

badac09

Minor fix in the issue template (p-value formatting)

2a3e41d

Small fix to issue template (do not print significance if p-value is …

1bb773e

…None)

mattbit enabled auto-merge December 14, 2023 15:22

mattbit merged commit b63c3d1 into main Dec 14, 2023

mattbit deleted the task/GSK-1279-statistical-tests branch December 14, 2023 15:37


		ctable = [[slice_x_cnt, slice_y_cnt], [comp_x_cnt, comp_y_cnt]]

		# if the slice size is too small, use Fisher's exact test, otherwise use a chi-square test

		# column_types=dataset.column_types.copy(),
		# validation=False,

Uh oh!

[GSK-1279] Fisher's exact test and permutation test for slice metrics significance testing #1671

[GSK-1279] Fisher's exact test and permutation test for slice metrics significance testing #1671

Uh oh!

Conversation

luca-rossi commented Dec 6, 2023

Uh oh!

linear bot commented Dec 6, 2023

Uh oh!

mattbit left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattbit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattbit left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Dec 14, 2023

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants