Tables for storing GWAS summary statistics following the GWAS-SSF standard format.
Metadata table for GWAS studies.
CREATE TABLE studies (
study_id SERIAL PRIMARY KEY,
study_accession VARCHAR(50) UNIQUE,
trait_name TEXT,
trait_ontology_id VARCHAR(50),
publication_pmid VARCHAR(20),
sample_size INTEGER,
n_cases INTEGER,
n_controls INTEGER,
genome_build VARCHAR(10),
analysis_software TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);| Column | Type | Description |
|---|---|---|
study_id |
SERIAL | Auto-incrementing primary key |
study_accession |
VARCHAR(50) | GWAS Catalog accession (e.g., GCST90012345) |
trait_name |
TEXT | Human-readable trait name |
trait_ontology_id |
VARCHAR(50) | EFO ontology term |
publication_pmid |
VARCHAR(20) | PubMed ID of publication |
sample_size |
INTEGER | Total sample size |
n_cases |
INTEGER | Number of cases (case-control only) |
n_controls |
INTEGER | Number of controls (case-control only) |
genome_build |
VARCHAR(10) | Reference genome (GRCh37/GRCh38) |
analysis_software |
TEXT | Software used for association testing |
created_at |
TIMESTAMPTZ | Record creation timestamp |
| study_id | study_accession | trait_name | sample_size | n_cases | n_controls |
|---|---|---|---|---|---|
| 1 | GCST90012345 | Type 2 Diabetes | 898130 | 74124 | 824006 |
| 2 | GCST90002357 | Height | 253288 | NULL | NULL |
Per-variant GWAS association results following GWAS-SSF standard.
CREATE TABLE gwas_summary_stats (
id SERIAL PRIMARY KEY,
variant_id BIGINT,
study_id INTEGER REFERENCES studies(study_id),
effect_allele VARCHAR(255) NOT NULL,
other_allele VARCHAR(255),
beta DOUBLE PRECISION,
odds_ratio DOUBLE PRECISION,
standard_error DOUBLE PRECISION,
p_value DOUBLE PRECISION NOT NULL,
effect_allele_frequency DOUBLE PRECISION,
n_total INTEGER,
n_cases INTEGER,
info_score DOUBLE PRECISION,
is_effect_allele_alt BOOLEAN,
UNIQUE (variant_id, study_id)
);| Column | Type | Description |
|---|---|---|
id |
SERIAL | Auto-incrementing primary key |
variant_id |
BIGINT | FK to variants table (NULL if unmatched) |
study_id |
INTEGER | FK to studies table |
effect_allele |
VARCHAR(255) | Allele tested for association |
other_allele |
VARCHAR(255) | Reference allele |
beta |
DOUBLE PRECISION | Effect size (quantitative traits) |
odds_ratio |
DOUBLE PRECISION | Odds ratio (binary traits) |
standard_error |
DOUBLE PRECISION | Standard error of effect estimate |
p_value |
DOUBLE PRECISION | Association p-value |
effect_allele_frequency |
DOUBLE PRECISION | EAF in study population |
n_total |
INTEGER | Per-variant sample size |
n_cases |
INTEGER | Per-variant case count |
info_score |
DOUBLE PRECISION | Imputation quality score |
is_effect_allele_alt |
BOOLEAN | TRUE if effect allele is ALT |
CREATE INDEX idx_gwas_pvalue ON gwas_summary_stats (p_value)
WHERE p_value < 5e-8;
CREATE INDEX idx_gwas_study_id ON gwas_summary_stats (study_id);
CREATE INDEX idx_gwas_variant_id ON gwas_summary_stats (variant_id)
WHERE variant_id IS NOT NULL;
CREATE INDEX idx_gwas_study_pvalue ON gwas_summary_stats (study_id, p_value);SELECT
v.chrom,
v.pos,
v.rs_id,
g.effect_allele,
g.beta,
g.p_value
FROM gwas_summary_stats g
JOIN variants v ON g.variant_id = v.variant_id
WHERE g.study_id = 1
AND g.p_value < 5e-8
ORDER BY g.p_value;SELECT
study_accession,
trait_name,
sample_size,
n_cases,
n_controls,
n_eff(n_cases, n_controls) as effective_n
FROM studies
WHERE n_cases IS NOT NULL;SELECT
v.chrom,
v.pos,
-LOG10(g.p_value) as neg_log_p
FROM gwas_summary_stats g
JOIN variants v ON g.variant_id = v.variant_id
WHERE g.study_id = 1
ORDER BY v.chrom, v.pos;SELECT COUNT(*)
FROM gwas_summary_stats
WHERE study_id = 1
AND info_score >= 0.8
AND p_value < 5e-8;The schema follows the GWAS Summary Statistics Format standard:
| GWAS-SSF Field | Column |
|---|---|
chromosome |
via variant_id |
base_pair_location |
via variant_id |
effect_allele |
effect_allele |
other_allele |
other_allele |
beta |
beta |
odds_ratio |
odds_ratio |
standard_error |
standard_error |
p_value |
p_value |
effect_allele_frequency |
effect_allele_frequency |
variant_id (rsID) |
via variant_id |
vcf-pg-loader import-gwas sumstats.tsv \
--study-id GCST90012345 \
--trait "Type 2 Diabetes" \
--sample-size 898130 \
--n-cases 74124 \
--n-controls 824006 \
--db postgresql://localhost/prs_dbvcf-pg-loader list-studies --db postgresql://localhost/prs_db- variants - Linked via
variant_id - prs_weights - PRS weights often derived from GWAS
- prs_candidate_variants - Pre-filtered variants with GWAS stats