-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Hello,
The majority of summary stats from various studies & biobanks are available as bgzip-ed TSVs. For converting these to VCF one can use gwas2vcf (https://github.com/MRCIEU/gwas2vcf) but it supports at this point just a set of input columns. Extending it beyond that does not look like a trivial task at lest to me.
Putting these "extra" columns from TSV to gwas2vcf produced VCF is something which can be done using bcftools annotate, but
this looks a much less flexible process than vcfanno. I am positive than rather sooner than later I will have to not just copy some value from the TSV and "paste" it into VCF but modify it on the fly.
Hence my questions:
-
would it be possible to enhance vcfanno to handle at least the "well behaved" GWAS TSV files as an annotation source?
For example the TSV format described here: https://finngen.gitbook.io/documentation/data-description -
In a meantime, can vcfanno use BED-VCF-like format derived from above Finngen's TSV with canonical first 3 BED columns plus REF & ALT
22 100000 100000 A T
followed by either all the remaining columns from the TSV input or just the "extras" not present already already in the gwas2vcf produced?
The ALT and REF are needed, since the input TSV has some rows with things like:
22 100000 A T bunch_of_columns_here
22 100000 A G bunch_of_columns_here
22 100000 A CG
Thank you,
Darek Kedra