This is a small pipeline that leverages Nextflow to annotates tumor somatic VCFs with gene information and variant effects. ๐ฅ
This pipeline performs the following steps:
- Normalization: Filters VCFs to include only PASS variants and normalizes them using bcftools.
- Annotation: Uses Ensembl VEP (via the official Docker container for release 112.0) to annotate variants with gene symbols, mutation consequences, and protein changes.
- Parsing: Runs a custom Python script to generate summary TSV files:
- A complete summary of all annotations.
- A filtered summary of protein-changing variants.
- Nextflow: Workflow orchestration for scalability and reproducibility.
- Docker / Singularity: Uses a base Docker image, hosted on Docker Hub, as well as the official Ensembl VEP Docker image.
- bcftools & tabix: For variant normalization and VCF indexing.
- Ensembl VEP: The variant effect predictor (using the official docker://ensemblorg/ensembl-vep:release_112.0 image).
- Python & pysam: For parsing and summarizing the VEP annotations.
- Nextflow installed.
- Singularity
- A local VEP cache directory (set as an environment variable, e.g.,
$VEP_CACHEDIR).
Clone the repository and run the pipeline with the following command:
nextflow run annotate-vcfs.nf --vcfs <WILDCARD-VCF-PATH> -with-singularity --vep_cachedir $VEP_CACHEDIR