Skip to content

Disk space consumption with --gvcf option #48

@kim-fehl

Description

@kim-fehl

After running analysis with --gvcf option on a 50 Gb BAM file containing 4 ONT runs and HG19 reference, the resulting tmp output subfolder takes 419 Gb, plus 117 Gb in the main output folder. Probably, it would make sense to remove VCF partial files after concatenating and sorting them and compress the output. For instance, a 117 Gb GVCF file takes only 8.5 Gb when bzip2-compressed. Some libraries as lbzip2 can decompress it in parallel. Perhaps you want to minimize dependencies, but disk space efficiency is also important when it comes to renting servers with fast SSDs.

547M	./tmp/full_alignment_output/candidate_bed
3.6G	./tmp/full_alignment_output
233G	./tmp/gvcf_tmp_output
117G	./tmp/merge_output
18G	./tmp/pileup_output
174M	./tmp/phase_output/phase_vcf
48G	./tmp/phase_output/phase_bam
48G	./tmp/phase_output
419G	./tmp

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions