PRS Optimization: Complete Polygenic Risk Score Pipeline Infrastructure#41
Merged
Zacharyr41 merged 19 commits intomainfrom Jan 5, 2026
Merged
PRS Optimization: Complete Polygenic Risk Score Pipeline Infrastructure#41Zacharyr41 merged 19 commits intomainfrom
Zacharyr41 merged 19 commits intomainfrom
Conversation
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
GCST* identifiers are public GWAS Catalog study accessions, not secrets. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Add SampleQCConfig dataclass to allow overriding hardcoded thresholds for sex inference, call rate, contamination, and X chromosome PAR region while preserving backward compatibility through default values. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add HapMap3Downloader class for fetching reference panel from LDpred2 figshare - Add download-reference CLI command with caching and checksum support - Integrate cached downloads with load-reference command - Add httpx dependency for async HTTP downloads - Update CLI and schema documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add LDBlockDownloader for downloading Berisa & Pickrell (2016) LD blocks - Support EUR, AFR, ASN populations from ldetect-data Bitbucket - Extend download-reference CLI with --population option for ld-blocks - Update load-reference to check cache first and suggest download - Add 27 tests covering config, checksum, downloader, and CLI - Update documentation (cli-reference, reference-tables, prs-workflows) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Change LEFT JOIN to INNER JOIN for gwas_summary_stats so only variants with effect estimates are included (no NULL betas) - Add gnomad_afr_af and gnomad_eas_af columns alongside gnomad_nfe_af for multi-ancestry PRS support 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
35eb734 to
45a4e27
Compare
Zacharyr41
commented
Jan 5, 2026
Owner
Author
Zacharyr41
left a comment
There was a problem hiding this comment.
Reviewed these. Last round I will upload shortly. Made some commits as I went through but now it looks good.
- Fix silent error swallowing in export_plink_score - Add shared variant matching utils with consistent chromosome normalization - Fix resource leak in genotype loader (try/finally for VCF close) - Add input validators for study_accession and genome_build - Update GWAS and PRS loaders to use shared utilities - Add docs note about materialized view GWAS dependency 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR introduces a comprehensive infrastructure for Polygenic Risk Score (PRS) calculation workflows, adding ~19,000 lines of production-ready code across 87 files. The implementation follows clinical-grade standards with full test coverage (1843 tests passing).
Key Features
1. GWAS Summary Statistics Import (GWAS-SSF Standard)
2. Genotype Data Storage with Dosage Support
3. Reference Panel Integration
4. PGS Catalog Integration
5. Quality Control Pipeline
6. Materialized Views for PRS Queries
prs_candidate_variants: Pre-joined variants × GWAS × weightssample_prs_components: Per-sample score building blocks7. Export Functions
.valid,.snpfiles)8. SQL Validation Functions
is_complement)is_strand_ambiguous)harmonize_effect_allele)normalize_chromosome)Schema Changes
gwas_summary_statsstudiesgenotype_dosagesprs_weightsprs_scoreshapmap3_variantsld_blockssample_qc_metricsvariant_qc_metricspopulation_frequenciesCLI Commands Added
Performance
Testing
Documentation
Breaking Changes
None. All changes are additive.
Test Plan
🤖 Generated with Claude Code