added chi square independent test #1026

rabah-khalek · 2023-05-25T15:25:08Z

Calculates the independence test for:

classification: chi_square
regression: ttest

Examples:

classification

from giskard import slicing_function
import numpy as np


dataset = Dataset(credit, name="testing dataset", target="default", column_types=column_types)

@slicing_function(row_level=False)
def slice(df: pd.DataFrame):
    return df.head(10)

sliced_dataset = dataset.slice(slice)
overlapping_idx = dataset.df[dataset.df.isin(sliced_dataset.df)].dropna().index.values
overlapping_array = np.zeros(len(dataset.df))
overlapping_array[overlapping_idx] = 1

array_full = np.zeros(len(dataset.df))

predictions = model.predict(dataset).prediction
crosstab_overlap = pd.crosstab(list(overlapping_array), list(predictions))

from scipy import stats
pvalue = stats.chi2_contingency(crosstab_overlap)[1] # 0.37906791538682305

in this case pvalue > 0.1 --> we can reject that the slice and predictions are independent (H0)

regression

from scipy.stats import ttest_ind
rand = np.array([random.randint(0,1000) for i in range(len(dataset.df))])
ttest_ind(rand,np.arange(0,1000),  alternative="two-sided", equal_var=True) # 0.42475623406765706

ttest_ind(np.arange(0,1000),np.arange(0,1000), alternative="two-sided", equal_var=True) # 1

in this case pvalue > 0.1 --> we can reject that the slice and predictions are independent (H0)

…_function

…skard into feature/auto-model

This reverts commit 2986f14.

…000-robustness-numerical" This reverts commit 0be71d0, reversing changes made to 7968a6d.

This reverts commit 24fed34.

…s-numerical

…ippet-to-start-with-a-blank

…ation_labels task/Numpy encoder for classification_labels

refactored ml_worker module and improved `import giskard` speed

sonarqubecloud · 2023-05-26T01:21:38Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

13.6% Coverage
0.0% Duplication

jmsquare

The test name should not be test_independence_chi_square since it support botht classification and regression (ttest). My proposition is test_independence_slice
in the decorator the name of test should be changed
the tags in the decorator should "heuristic"
less priority: the output_df of the the test could be the example inside the slice that illustrates this correlations:
-(i) for classification: the examples in the slice that return the classification labels that is the most contributing to the chi square
-(ii) for regression: the examples in the slice that return the most extreme values (that explains why the mean between both slides were different).

mattbit · 2023-05-26T08:07:22Z

python-client/giskard/testing/tests/statistic.py

+
+@test(name="Right Label", tags=["heuristic", "classification"])
+def test_independence_chi_square(model: BaseModel, dataset: Dataset,
+                                 slicing_function: SlicingFunction, threshold: float = 0.1) -> TestResult:


p_value < 0.1 is a implausibly high threshold! I would set it at maximum to 0.01. Generally Χ² is quite powerful, so we will not miss meaningul detections if the sample size is decent (for example I use Χ² tests in TextSlicer to find deviant tokens with p_value < 1e-3).

mattbit · 2023-05-26T08:09:07Z

python-client/giskard/testing/tests/statistic.py

+    original_df = dataset.df.reset_index(drop=True)
+    sliced_df = sliced_dataset.df.reset_index(drop=True)
+    overlapping_idx = original_df[original_df.isin(sliced_df)].dropna().index.values
+    overlapping_array[overlapping_idx] = 1


Can’t we just find the complementary slice based on the original index?

You should also check that the complementary slice is not empty

mattbit · 2023-05-26T08:13:28Z

python-client/giskard/testing/tests/statistic.py

+        predictions = model.predict(dataset).prediction
+        p_value = ttest_ind(overlapping_array, predictions, alternative="two-sided", equal_var=True).pvalue
+
+    passed = p_value < threshold


I am a bit confused, what are we testing? Because if it’s independence it’s the other way around. p_value < threshold indicates the rejection of independence hypothesis.

mattbit · 2023-05-26T08:16:41Z

@rabah-khalek I think you got the test the other way around.

You wrote that:

in this case pvalue > 0.1 --> we can reject that the slice and predictions are independent (H0)

It should be the opposite: you reject the independent hypothesis if p_value < alpha. Otherwise, I may have misunderstood what we are testing.

rabah-khalek · 2023-05-26T12:31:54Z

I've written this at 11pm... But wait, the null in these tests (see doc) is that the samples are more likely drawn from the same distribution (dependent in our lingo).

A p-value larger than a chosen threshold (e.g. 5% or 1%) indicates that our observation is not so unlikely to have occurred by chance. Therefore, we do not reject the null hypothesis of equal population means. If the p-value is smaller than our threshold, then we have evidence against the null hypothesis of equal population means.

p_value < alpha ==> populations are different (independent in our lingo) ==> test passes.

It's true that I named wrong H0 to refer to independence in the comment above, when it's referring rather to dependence in these tests. But otherwise the test logic is the good one. Let me know if you agree @mattbit.

P.S. the _ind in the ttest refers to independent samples, not independence (in terms of hypothesis)

mattbit · 2023-05-26T14:27:02Z

Sorry, these tests are always a bit confusing, I think we are saying the same thing but with different terminology. Let me recap more precisely and tell me if I’m correct and we agree:

I’m looking at the Χ² test. Reading the code, the null hypothesis of the Χ² is that the observed predictions are independent of being in the slice or not. More rigorously, $H_0$ is
$$P[\text{sample in data slice } \land \text{ predicted label}=y] = P[\text{sample in data slice}]\ P[\text{predicted label}=y]$$

You are passing a contingency table to measure this effect, i.e. belonging on the data slice should not affect the expected value of the prediction. For $p < \alpha$, you reject this hypothesis and you say that the predicted label looks correlated to being in the data slice or not. In this last case, the Giskard test should not pass.

If this is correct, then this is what I need for spurious correlation. Calling this tes "independence" can be confusing, I think we need to be more precise, it could be test_slice_prediction_independence or something better.

Is this correct @rabah-khalek?

mattbit · 2023-05-26T14:32:04Z

@rabah-khalek @jmsquare I think we need to be very careful with this kind of test. The model prediction will always be dependent on the feature value, otherwise it would mean that that feature is not informational at all. If we want to point out misbehaviour, we need to restrict this detection only to extreme cases. Thinking about it now, I’m not sure the $\chi ^2$ test is the correct approach here. I’m implementing this in scan, but I think it’s better to spend some time testing if this gives actually meaningful results.

rabah-khalek · 2023-05-26T15:40:52Z

Agreed @mattbit, it is a very basic test, and has fundamental limitations... It'll be hard to define robustly a consistent definition of "Spurious correlation"

rabah-khalek · 2023-05-26T16:42:04Z

just read your previous comment. I think the confusion between us stems from the difference of what chisquare and ttest are testing:

from scipy import stats
import pandas as pd
import numpy as np

u1 = np.zeros(500)
u2 = np.zeros(500)
u1[:250]=1
u2[:250]=1

crosstab = pd.crosstab(u1, u2)

from scipy import stats
pvalue_chi2 = stats.chi2_contingency(crosstab)[1] # 7.023536136418314e-110
#%%
from scipy.stats import ttest_ind
pvalue_ttest = ttest_ind(u1, u2,  alternative="two-sided", equal_var=True) # 1

pvalue_chi2 < alpha ==> rejection of independence hypothesis
pvalue_ttest < alpha ==> rejection of equal mean hypothesis
in which case, i should've reversed the inequality for ttest (regression) to make a consistent interpretation.

mattbit · 2023-05-26T19:00:39Z

@rabah-khalek I think we can put this on hold since we will not be working on spurious correlation for now

rabah-khalek · 2023-06-19T10:31:21Z

@mattbit, is this now superseded by #1178?

mattbit · 2023-06-19T11:43:46Z

@mattbit, is this now superseded by #1178?

Yes, I think so. Could still be useful to merge though, I can take care of that in GSK-1316 if you want.

rabah-khalek · 2023-06-20T07:19:06Z

ah okay, for the giskard tests. Yep, have a look and let me know if there's anything missing. Have we converged on our previous conversation?

rabah-khalek · 2023-08-08T16:51:54Z

superseded by #1302

rabah-khalek and others added 30 commits May 6, 2023 19:29

Merge branch 'feature/ai-test-v2-merged' into feature/auto-model

b917d62

better factorization of PredictionFunctionModel

1ef33ea

solved auto-completion

1a4509d

merged feature/wrap_prediction_function

929f213

Merge branch 'feature/ai-test-v2-merged' into feature/wrap_prediction…

fe5be55

…_function

Merge branch 'feature/wrap_prediction_function' into feature/auto-model

96dd59f

fixed class casting

d87db3b

Merge branch 'feature/auto-model' of https://github.com/Giskard-AI/gi…

9995cde

…skard into feature/auto-model

better error raising in automodel

8caec6c

better logic in Model __new__

d39ba21

fixed unit-tests

fa4d314

corrected comment

c5d998f

fixed linter

c3923f9

corrected comment

a518c8a

Merge branch 'feature/wrap_prediction_function' into feature/auto-model

d76f94c

fixed is_overriden logic, fixed unit-tests

f8ba0b7

Merge branch 'feature/auto-model' of https://github.com/Giskard-AI/gi…

face8f4

…skard into feature/auto-model

wrap_dataset takes df instead of dataset + modification to the doc

9216891

Merge branch 'feature/wrap_prediction_function' into feature/auto-model

7e1c67d

Revert "Re-enabling bias detector test"

24fed34

This reverts commit 2986f14.

Revert "Merge branch 'feature/new-custom-model-part2' into task/GSK-1…

606aa21

…000-robustness-numerical" This reverts commit 0be71d0, reversing changes made to 7968a6d.

Revert "Revert "Re-enabling bias detector test""

2a98d5f

This reverts commit 24fed34.

Merge branch 'feature/ai-test-v2-merged' into task/GSK-1000-robustnes…

08b98db

…s-numerical

doc/update

287ed96

Update step names

3626923

Merge branch 'feature/ai-test-v0-merged' into feature/gsk-960-code-sn…

c74a4f8

…ippet-to-start-with-a-blank

typo in slicing.py fixed

7c22e67

Check for external ML worker when opening Quick Start

a050ab8

Fill stepper contents from 2 to 4

87aaea2

Remove icons from artifact toggle button

48cb99a

andreybavt and others added 3 commits May 25, 2023 11:35

Merge pull request #1022 from Giskard-AI/task/numpy_encoder_classific…

1d898ae

…ation_labels task/Numpy encoder for classification_labels

improve import giskard speed (#940)

3f62c0f

refactored ml_worker module and improved `import giskard` speed

added chi square independent test

a74f58c

rabah-khalek requested review from jmsquare and mattbit May 25, 2023 15:25

rabah-khalek self-assigned this May 25, 2023

rabah-khalek marked this pull request as draft May 25, 2023 15:25

updated test_independence_chi_square

83e81be

rabah-khalek requested a review from mathieuroques May 25, 2023 20:36

rabah-khalek added 2 commits May 26, 2023 02:39

improved test

47660b0

reduced redundancy

17b2bc1

jmsquare reviewed May 26, 2023

View reviewed changes

mattbit requested changes May 26, 2023

View reviewed changes

rabah-khalek added the on hold label May 28, 2023

andreybavt force-pushed the feature/ai-test-v2-merged branch from 80c1113 to be66b69 Compare June 7, 2023 15:24

mattbit changed the base branch from feature/ai-test-v2-merged to main June 14, 2023 13:17

rabah-khalek closed this Aug 8, 2023

Hartorn deleted the test/chi_square_independence branch September 22, 2023 10:56

Uh oh!

added chi square independent test #1026

added chi square independent test #1026

Uh oh!

Conversation

rabah-khalek commented May 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented May 26, 2023

Uh oh!

jmsquare left a comment

Choose a reason for hiding this comment

Uh oh!

mattbit May 26, 2023

Choose a reason for hiding this comment

Uh oh!

mattbit May 26, 2023

Choose a reason for hiding this comment

Uh oh!

mattbit May 26, 2023

Choose a reason for hiding this comment

Uh oh!

mattbit May 26, 2023

Choose a reason for hiding this comment

Uh oh!

mattbit commented May 26, 2023

Uh oh!

rabah-khalek commented May 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattbit commented May 26, 2023

Uh oh!

mattbit commented May 26, 2023

Uh oh!

rabah-khalek commented May 26, 2023

Uh oh!

rabah-khalek commented May 26, 2023

Uh oh!

mattbit commented May 26, 2023

Uh oh!

rabah-khalek commented Jun 19, 2023

Uh oh!

mattbit commented Jun 19, 2023

Uh oh!

rabah-khalek commented Jun 20, 2023

Uh oh!

rabah-khalek commented Aug 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

8 participants

rabah-khalek commented May 25, 2023 •

edited

Loading

rabah-khalek commented May 26, 2023 •

edited

Loading