thisisnic · jtalboys · Jul 31, 2024 · Jul 31, 2024 · Jul 31, 2024 · Jul 31, 2024
diff --git a/python/binomial_test.qmd b/python/binomial_test.qmd
@@ -0,0 +1,78 @@
+---
+title: "Binomial Test"
+format: html
+editor: visual
+---
+
+The statistical test used to determine whether the proportion in a binary outcome experiment is equal to a specific value. It is appropriate when we have a small sample size and want to test the success probability $p$ against a hypothesized value $p_0$.
+
+## Creating a sample dataset
+
+-   We will generate a dataset where we record the outcomes of 1000 coin flips.
+
+-   We will use the `binom.test` function to test if the proportion of heads is significantly different from 0.5.
+
+```{python}
+import numpy as np
+from scipy.stats import binomtest
+
+# Set seed for reproducibility
+np.random.seed(19)
+coin_flips = np.random.choice(['H', 'T'], size=1000, replace=True, p=[0.5, 0.5])
+```
+
+Now, we will count the heads and tails and summarize the data.
+
+```{python}
+# Count heads and tails
+heads_count = np.sum(coin_flips == 'H')
+tails_count = np.sum(coin_flips == 'T')
+total_flips = len(coin_flips)
+
+heads_count, tails_count, total_flips
+```
+
+## Conducting Binomial Test
+
+```{python}
+# Perform the binomial test
+binom_test_result = binomtest(heads_count, total_flips, p=0.5)
+binom_test_result
+```
+
+### Results:
+
+The output has a p-value `py binom_test_result` $> 0.05$ (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the **coin is fair**.
+
+# Example of Clinical Trial Data
+
+We load the `lung` dataset from `survival` package. We want to test if the proportion of patients with survival status 1 (dead) is significantly different from a hypothesized proportion (e.g. 50%)
+
+We will calculate number of deaths and total number of patients.
+
+```{python}
+import pandas as pd
+
+# Load the lung dataset as an example (using a mock dataset for demonstration purposes)
+lung = pd.read_csv("../data/lung_cancer.csv")
+
+# The 'status' flag in the dataset here is flipped compared to the R version
+num_deaths = np.sum(lung['status'] == 0)
+total_pat = lung.shape[0]
+
+num_deaths, total_pat
+```
+
+## Conduct the Binomial Test
+
+We will conduct the Binomial test and hypothesize that the proportion of death should be 19%.
+
+```{python}
+# Perform the binomial test
+binom_test_clinical = binomtest(num_deaths, total_pat, p=0.19)
+binom_test_clinical
+```
+
+## Results:
+
+The output has a p-value `py binom_test_clinical` $< 0.05$ (chosen level of significance). Hence, we reject the null hypothesis and conclude that **the propotion of death is significantly different from 19%**.