Skip to content

Releases: PacificBiosciences/paraphase

Version 3.4.0

07 Nov 19:16
a72afbc

Choose a tag to compare

Summary of changes:

  • Fix bug where MM/ML tags are off in supplementary alignments
  • Fix bug where the last base of a read is sometimes used in phasing
  • Fix bug in variant calling where in some reads a wrong base is chosen between primary and supplementary alignments
  • Fix bug where some alleles may include redundant haplotype names
  • Fix bug where it's not possible to lower the minimum variant frequency for variant calling for individual target regions. This change affects opn1lw and ikbkg.
  • Improve large deletion calling in reads. This change may affect all targets, particularly ikbkg and pms2.
  • Rename haplotypes to label gene1 and gene2 for regions with fusion calling(GBA, CYP2D6, CYP11B1 and CFH/CFHR3)
  • For smn1, consider more scenarios for adjusting SMN2 copy number
  • For ncf1, adjust copy number for the scenario where three haplotypes are found and all are present at two copies.
  • Update two copy haplotypes for CFHclust

Version 3.3.4

20 Aug 17:00
ef1f25c

Choose a tag to compare

Summary of changes

  • Fix f-string bug that causes python 3.11 or earlier to fail.

Version 3.3.3

15 Aug 21:58
1c265d7

Choose a tag to compare

Summary of changes:

  • Fix bug that min_variant_frequency cannot be set lower than the default value
  • Fix minor bug in ikbkg that causes program to error out
  • Sort reads by name first to remove indeterminism in haplotype names
  • Do not write VCF when region is clearly not homozygous but no haplotypes are phased (most likely due to low depth)
  • Add the phase_region field in JSON output to report the coordinates of the analysis region and the genome build
  • Minor improvement on indel detection
  • Minor improvement on handling gene1_cn2, a scenario specified in the config asking Paraphase to assume a paralog group to always have two copies of gene1
  • Improve documentation
    • Update NEB tutorial to clarify on the order of TRI1/2/3
    • Update README to clarify that the fusions_called field is only reported for four regions
    • Update the targeted data tutorial to include more details on PureTarget

Version 3.3.2

28 May 15:54
dce99f8

Choose a tag to compare

Summary of changes:

  • Fix rare scenarios when program errors out in some low-depth regions. No algorithm change.
  • Update license.

Version 3.3.1

02 May 21:47
bc7134f

Choose a tag to compare

Bug fix:

  • Preserve MM/ML tags through minimap2 realignment instead of parsing MM/ML tags (Version 3.3.0 uses pysam to parse base modifications, but we have noticed some cases where pysam v0.23 crashes when parsing base modifications)

Version 3.3.0

26 Apr 17:37
16c90e4

Choose a tag to compare

Summary of changes:

  • Improve phasing haplotypes into alleles, allowing (n-1)/1 scenario
  • Do not adjust total_cn from 2 to 4 based on depth when calling fusions
  • Fix rare bugs in depth-related analysis
  • Update ALT alleles to . in the VCF for LowQual calls when ALT is equal to REF
  • Add Ml/Mm tags in Paraphase bam for base modification information
  • In Json output, rename hap_links to haplotype_links and rename linked_haplotypes to raw_alleles
  • For targeted data
    • Improve copy number adjustment based on depth
    • Add option to assume a paralog group to always have more than one copy of gene1
    • Update command line options to use frequency-based parameters: --min-variant-frequency and --min-haplotype-frequency
  • smn1
    • Enable smn1 analysis for CHM13-mapped data (Note that haplogroup assignment is not available with CHM13)
    • Fix bug with assigning haplogroups to smn2 haplotypes
    • Rename smn2_del78 haplotypes to smn_del78
  • hba
    • Do not consider homology haplotypes during allele phasing
    • Better handle hba with targeted data, fixing problems with identifying homology haplotypes
    • Update genotype calls based on phased alleles
    • Report genome coordinates for 3p7 and 4p2 SVs in the sv_called field
  • Minor update to strc and ncf1 for copy number adjustment based on depth
  • Update f8 to reflect SV types in the haplotype name

Version 3.2.1

10 Feb 19:44
ce232a9

Choose a tag to compare

Summary of changes:

  • Fix problem working with CRAM
  • Other minor changes:
    • Add RN tag to output bam. RN stands for region name, indicating reads used to analyze one region (paralog group)
    • Sort some output fields in the JSON, making it consistent from run to run
    • Clean up reported alleles, filtering out cases where all haplotypes are linked into one allele, or not all haplotypes are included when two alleles are reported
    • Fix getting sample ID from bam header when there are blank spaces
    • Fix logic for writing homozygous haplotypes in VCF (no haplotypes phased -> no haplotypes phased and no heterozygous variant sites)
    • Update two_cp_haplotypes when adjusting total_cn from 2 to 4 based on depth
    • Use all reads instead of unique reads for variant calling at edges of clipped haplotypes

Version 3.2.0

25 Jan 21:32
933f7d1

Choose a tag to compare

Summary of changes:

  1. Updates to better handle targeted data
  • Filter reads on rq (>=0.99), if rq is present in input bam
  • Add a --targeted option for targeted data to drop the assumption of uniform coverage across the genome
  • Add two optional parameters for targeted data
    • --min-read-variant: Partially controls the number of supporting reads for a variant for identifying variants used for phasing. The cutoff for variant-supporting reads is determined by min(this number, max(5, depth*0.11)). Default is 20. At standard WGS depth, the default value is overwritten by max(5, depth*0.11).
      • Use cases: 1) Set this number low for low-coverage data or to increase sensitivity. 2) For targeted data with high coverage, set this number relatively high to avoid picking up sequencing errors and to reduce run time.
    • --min-read-haplotype: Minimum number of unique supporting reads for a haplotype. Default is 4. For targeted data with high coverage, this cutoff can be increased to reduce errors and to reduce run time.
  1. Updates to target regions:
  • Update coordinates of some target regions to include full genes whenever possible: pms2,ikbkg,hba,DDT,MBD3L2,DEFA1,PRY,CHRNA7,DHX40,GOLGA8A,IQCK,NXF2,OTOA,PDPK1,POTEI,RGPD1,RGPD3,RSPH10B,SIK1,TMLHE,CBS,KCNE1,CASTOR2,NBPF4,RGPD5,GOLGA8N,POTEB,ANKRD20A1,NSF
  • Add TNXB as a region on its own so that the full gene can be genotyped (the RCCX region only includes part of TNXB)
  1. Algorithmic changes
  • Improve fusion calling in cases of homozygous deletion
  • Add some homozygous sites to cover target regions evenly during phasing to improve read assignment to haplotypes and variant calling
  • Update a few gene-specific callers
    • hba: Add calling of 4.2 deletion/duplication
    • smn1: If homozygous throughout region, default to CN =2 instead of 1; Drop carrier call if only one SMN1 haplotype is found but the total CN of SERF1A/B (neighboring locus) is larger than the total CN of SMN1/2
    • ikbkg: Improve calling of the 11.7kb deletion; Update the config to genotype the entire gene
    • ncf1: Drop carrier call if only one NCF1 haplotype is found but the total CN of GTF2I (neighboring locus) is larger than the total CN of NCF1 family
    • rccx: Better handle homozygous deletion cases
    • pms2: Update the config to genotype the entire gene
  1. Other changes:
  • Support cram as input
  • Standardize haplotype naming across regions: {gene name}_{haplotype name}

Version 3.1.2

24 Jan 23:06
f4630d2

Choose a tag to compare

Summary of changes:

  • Add --write-nocalls-in-vcf option to write no-call sites in the VCF

Version 3.1.1

18 Apr 18:47
8de77bb

Choose a tag to compare

Summary of changes:
Minor update. Fix program error in low-depth or no-data regions. Completes analysis even when the input is a small bamlet (result is still a no-call).