Skip to content

michael-ford/variant-annotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Gene Annotation VCF Pipeline

This is a small pipeline that leverages Nextflow to annotates tumor somatic VCFs with gene information and variant effects. ๐Ÿ’ฅ

Overview ๐ŸŽฏ

This pipeline performs the following steps:

  • Normalization: Filters VCFs to include only PASS variants and normalizes them using bcftools.
  • Annotation: Uses Ensembl VEP (via the official Docker container for release 112.0) to annotate variants with gene symbols, mutation consequences, and protein changes.
  • Parsing: Runs a custom Python script to generate summary TSV files:
    • A complete summary of all annotations.
    • A filtered summary of protein-changing variants.

Tools & Technologies ๐Ÿ”ง

  • Nextflow: Workflow orchestration for scalability and reproducibility.
  • Docker / Singularity: Uses a base Docker image, hosted on Docker Hub, as well as the official Ensembl VEP Docker image.
  • bcftools & tabix: For variant normalization and VCF indexing.
  • Ensembl VEP: The variant effect predictor (using the official docker://ensemblorg/ensembl-vep:release_112.0 image).
  • Python & pysam: For parsing and summarizing the VEP annotations.

Getting Started ๐Ÿš€

Prerequisites

  • Nextflow installed.
  • Singularity
  • A local VEP cache directory (set as an environment variable, e.g., $VEP_CACHEDIR).

Usage

Clone the repository and run the pipeline with the following command:

nextflow run annotate-vcfs.nf --vcfs <WILDCARD-VCF-PATH> -with-singularity --vep_cachedir $VEP_CACHEDIR

About

Single nextflow workflow for annotating VCF files to get affected genes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published