AlleleFlux is a bioinformatics toolkit for analyzing allele frequency changes in metagenomic time-series data. It identifies genomic targets of natural selection in microbial communities by calculating:
- Parallelism scores — detect parallel allele frequency changes across replicates within groups
- Divergence scores — quantify allele frequency divergence between experimental groups
- dN/dS ratios — measure selection pressure on genes via non-synonymous to synonymous substitution rates
These scores enable direct comparisons of evolutionary dynamics across taxa, genomes, and genes, helping identify loci under strong selection.
Explore AlleleFlux results interactively with our new web-based dashboard:
🚀 AlleleFlux Explorer — Interactive visualization of parallelism scores, divergence, allele trajectories, and dN/dS results. (Active development)
Installation via conda typically takes 2–5 minutes on a standard desktop computer, depending on internet speed and pre-existing dependencies. All dependencies can be found here in environment.yml. This software has been tested on the Red Hat Enterprise Linux 9 operating system.
# Install with conda (or mamba)
conda install -c conda-forge -c bioconda alleleflux
# Activate the environment
conda activate alleleflux# Clone the repository
git clone https://github.com/MoellerLabPU/AlleleFlux.git
cd AlleleFlux
# Create environment with dependencies
conda env create -f environment.yml
conda activate alleleflux
# Or install directly with pip
pip install -e .| File | Description |
|---|---|
| Reference FASTA | Combined MAG contigs (header format: <MAG_ID>.fa_<contig_ID>) |
| Prodigal genes | Nucleotide ORF predictions (.fna) matching reference contig IDs |
| MAG mapping | TSV with columns: mag_id, contig_id |
| Metadata TSV | Sample info with columns: sample_id, bam_path, group, time |
| GTDB taxonomy | gtdbtk.bac120.summary.tsv from GTDB-Tk |
See Input Preparation Guide for detailed format specifications.
# Interactive configuration wizard
alleleflux init
# Or copy the template manually
cp $(python -c "import alleleflux; print(alleleflux.__path__[0])")/smk_workflow/config.template.yml config.ymlEdit config.yml with your paths and parameters. Here is the complete configuration with all options:
run_name: "alleleflux_analysis"
# Input Files
input:
fasta_path: "" # Reference FASTA file (required)
prodigal_path: "" # Prodigal nucleic acid output (.fna)
metadata_path: "" # Sample metadata file
gtdb_path: "" # GTDB taxonomy file
mag_mapping_path: "" # MAG-to-contig mapping file
# Output Directory
output:
root_dir: "./alleleflux_output"
log_level: "INFO" # DEBUG, INFO, WARNING, ERROR
# Analysis Configuration
analysis:
data_type: "longitudinal" # "single" or "longitudinal"
allele_analysis_only: False # Skip scoring/outlier detection
use_lmm: True # Linear Mixed Models
use_significance_tests: True # Two-sample and single-sample tests
use_cmh: True # CMH test
timepoints_combinations:
- timepoint: ["pre", "post"]
focus: "post"
groups_combinations:
- ["treatment", "control"]
# Quality Control
quality_control:
min_sample_num: 4
breadth_threshold: 0.1
coverage_threshold: 1
disable_zero_diff_filtering: False
# Profiling
profiling:
ignore_orphans: False
min_base_quality: 30
min_mapping_quality: 2
ignore_overlaps: True
# Statistics
statistics:
filter_type: "t-test" # "t-test", "wilcoxon", or "both"
preprocess_between_groups: True
preprocess_within_groups: True
max_zero_count: 4
p_value_threshold: 0.05
fdr_group_by_mag_id: False
min_positions_after_preprocess: 1
# dN/dS Analysis
dnds:
p_value_column: "q_value"
dn_ds_test_type: "two_sample_paired_tTest"
# Compute Resources
resources:
threads_per_job: 16
mem_per_job: "8G"
time: "24:00:00"| Section | Parameter | Description |
|---|---|---|
| input | fasta_path |
Reference FASTA with combined MAG contigs |
prodigal_path |
Prodigal nucleotide predictions (.fna) |
|
metadata_path |
Sample metadata TSV | |
gtdb_path |
GTDB-Tk taxonomy file | |
mag_mapping_path |
MAG-to-contig mapping TSV | |
| analysis | data_type |
"longitudinal" (multiple timepoints) or "single" |
allele_analysis_only |
Skip significance tests, scoring, and outlier detection if True |
|
use_lmm |
Enable Linear Mixed Models | |
use_significance_tests |
Enable two-sample/single-sample tests | |
use_cmh |
Enable Cochran-Mantel-Haenszel tests | |
timepoints_combinations |
Timepoint pairs with focus timepoint | |
groups_combinations |
Groups to compare | |
| quality_control | min_sample_num |
Minimum samples required per MAG |
breadth_threshold |
Minimum coverage breadth (0-1) | |
coverage_threshold |
Minimum average coverage depth | |
| profiling | min_base_quality |
Minimum base quality score |
min_mapping_quality |
Minimum mapping quality score | |
| statistics | filter_type |
Preprocessing filter type |
p_value_threshold |
Significance threshold | |
fdr_group_by_mag_id |
Apply FDR correction per MAG | |
| dnds | p_value_column |
"min_p_value" or "q_value" |
dn_ds_test_type |
Test type for filtering dN/dS results | |
| resources | threads_per_job |
Threads allocated per job |
mem_per_job |
Memory per job (e.g., "8G", "100G") |
|
time |
Wall time limit (HH:MM:SS) |
See Configuration Reference for complete documentation.
# Run locally
alleleflux run --config config.yml --threads 16
# Dry run to preview jobs
alleleflux run --config config.yml --dry-runFor HPC clusters, copy the SLURM profile from the source repository:
# Copy SLURM profile (if installed from source)
cp -r $(python -c "import alleleflux; print(alleleflux.__path__[0])")/smk_workflow/slurm_profile ./
# Run with SLURM
alleleflux run --config config.yml --profile ./slurm_profileThe SLURM profile automatically submits jobs via sbatch with resources from your config.
AlleleFlux is powered by a Snakemake workflow that orchestrates the complete analysis:
Input Files Profile & QC Statistical Analysis
━━━━━━━━━━━ ━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━
• Reference FASTA • Extract alleles • Two-sample tests
• Prodigal genes • Quality control • LMM / CMH tests
• Metadata TSV • Eligibility checks • dN/dS calculation
• MAG mapping ↓
┌─────────────────┐
│ Scoring & Viz │
├─────────────────┤
│ • Parallelism │
│ • Divergence │
│ • Outliers │
│ • Trajectories │
└─────────────────┘
Pipeline Steps:
- Profiling — Extract allele frequencies from BAM files for each MAG
- Quality Control — Filter samples by coverage breadth; determine MAG eligibility
- Statistical Testing — Apply appropriate tests based on experimental design
- Scoring — Calculate parallelism/divergence scores and identify outlier genes
- dN/dS Analysis — Calculate evolutionary rates for genes under selection
The workflow:
- Automatically parallelizes across samples and MAGs
- Handles checkpointing and restarts gracefully
- Supports local execution and HPC clusters (SLURM)
- Tracks provenance and ensures reproducibility
Results are organized in the output directory:
alleleflux_output/
├── profiles/ # Per-sample allele frequency profiles
├── metadata/ # Per-MAG metadata tables
├── eligibility/ # MAG eligibility tables
├── allele_analysis/ # Allele frequency analysis results
├── significance_tests/ # Statistical test results (LMM, CMH, t-tests)
├── scores/ # Parallelism and divergence scores
├── outliers/ # Genes with high scores (selection targets)
└── dnds/ # dN/dS analysis results
See Output Reference for file format details.
AlleleFlux provides 30+ standalone command-line tools:
# List all available tools
alleleflux tools
# Main commands
alleleflux run --help # Run the full pipeline
alleleflux init --help # Interactive configuration
alleleflux info # Show installation info
# Individual analysis tools
alleleflux-profile --help # Profile MAGs from BAM files
alleleflux-qc --help # Quality control
alleleflux-scores --help # Calculate parallelism/divergence scores
alleleflux-dnds-from-timepoints --help # Calculate dN/dS ratiosSee CLI Reference for the complete list.
Full documentation: alleleflux.readthedocs.io
Contributions are welcome! See CONTRIBUTING.md for guidelines.
AlleleFlux is licensed under the GNU General Public License v3.0.