Skip to content
78 changes: 78 additions & 0 deletions python/binomial_test.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: "Binomial Test"
format: html
editor: visual
---

The statistical test used to determine whether the proportion in a binary outcome experiment is equal to a specific value. It is appropriate when we have a small sample size and want to test the success probability $p$ against a hypothesized value $p_0$.

## Creating a sample dataset

- We will generate a dataset where we record the outcomes of 1000 coin flips.

- We will use the `binom.test` function to test if the proportion of heads is significantly different from 0.5.

```{python}
import numpy as np
from scipy.stats import binomtest

# Set seed for reproducibility
np.random.seed(19)
coin_flips = np.random.choice(['H', 'T'], size=1000, replace=True, p=[0.5, 0.5])
```

Now, we will count the heads and tails and summarize the data.

```{python}
# Count heads and tails
heads_count = np.sum(coin_flips == 'H')
tails_count = np.sum(coin_flips == 'T')
total_flips = len(coin_flips)

heads_count, tails_count, total_flips
```

## Conducting Binomial Test

```{python}
# Perform the binomial test
binom_test_result = binomtest(heads_count, total_flips, p=0.5)
binom_test_result
```

### Results:

The output has a p-value `py binom_test_result` $> 0.05$ (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the **coin is fair**.

# Example of Clinical Trial Data

We load the `lung` dataset from `survival` package. We want to test if the proportion of patients with survival status 1 (dead) is significantly different from a hypothesized proportion (e.g. 50%)

We will calculate number of deaths and total number of patients.

```{python}
import pandas as pd

# Load the lung dataset as an example (using a mock dataset for demonstration purposes)
lung = pd.read_csv("../data/lung_cancer.csv")

# The 'status' flag in the dataset here is flipped compared to the R version
num_deaths = np.sum(lung['status'] == 0)
total_pat = lung.shape[0]

num_deaths, total_pat
```

## Conduct the Binomial Test

We will conduct the Binomial test and hypothesize that the proportion of death should be 19%.

```{python}
# Perform the binomial test
binom_test_clinical = binomtest(num_deaths, total_pat, p=0.19)
binom_test_clinical
```

## Results:

The output has a p-value `py binom_test_clinical` $< 0.05$ (chosen level of significance). Hence, we reject the null hypothesis and conclude that **the propotion of death is significantly different from 19%**.