Releases: PacificBiosciences/paraphase
Releases · PacificBiosciences/paraphase
Version 3.4.0
Summary of changes:
- Fix bug where MM/ML tags are off in supplementary alignments
- Fix bug where the last base of a read is sometimes used in phasing
- Fix bug in variant calling where in some reads a wrong base is chosen between primary and supplementary alignments
- Fix bug where some alleles may include redundant haplotype names
- Fix bug where it's not possible to lower the minimum variant frequency for variant calling for individual target regions. This change affects
opn1lwandikbkg. - Improve large deletion calling in reads. This change may affect all targets, particularly
ikbkgandpms2. - Rename haplotypes to label gene1 and gene2 for regions with fusion calling(
GBA,CYP2D6,CYP11B1andCFH/CFHR3) - For
smn1, consider more scenarios for adjusting SMN2 copy number - For
ncf1, adjust copy number for the scenario where three haplotypes are found and all are present at two copies. - Update two copy haplotypes for
CFHclust
Version 3.3.4
Summary of changes
- Fix f-string bug that causes python 3.11 or earlier to fail.
Version 3.3.3
Summary of changes:
- Fix bug that
min_variant_frequencycannot be set lower than the default value - Fix minor bug in
ikbkgthat causes program to error out - Sort reads by name first to remove indeterminism in haplotype names
- Do not write VCF when region is clearly not homozygous but no haplotypes are phased (most likely due to low depth)
- Add the
phase_regionfield in JSON output to report the coordinates of the analysis region and the genome build - Minor improvement on indel detection
- Minor improvement on handling
gene1_cn2, a scenario specified in the config asking Paraphase to assume a paralog group to always have two copies of gene1 - Improve documentation
- Update NEB tutorial to clarify on the order of TRI1/2/3
- Update README to clarify that the
fusions_calledfield is only reported for four regions - Update the targeted data tutorial to include more details on PureTarget
Version 3.3.2
Summary of changes:
- Fix rare scenarios when program errors out in some low-depth regions. No algorithm change.
- Update license.
Version 3.3.1
Bug fix:
- Preserve MM/ML tags through minimap2 realignment instead of parsing MM/ML tags (Version 3.3.0 uses pysam to parse base modifications, but we have noticed some cases where pysam v0.23 crashes when parsing base modifications)
Version 3.3.0
Summary of changes:
- Improve phasing haplotypes into alleles, allowing (n-1)/1 scenario
- Do not adjust
total_cnfrom 2 to 4 based on depth when calling fusions - Fix rare bugs in depth-related analysis
- Update ALT alleles to
.in the VCF for LowQual calls when ALT is equal to REF - Add Ml/Mm tags in Paraphase bam for base modification information
- In Json output, rename
hap_linkstohaplotype_linksand renamelinked_haplotypestoraw_alleles - For targeted data
- Improve copy number adjustment based on depth
- Add option to assume a paralog group to always have more than one copy of gene1
- Update command line options to use frequency-based parameters:
--min-variant-frequencyand--min-haplotype-frequency
smn1- Enable
smn1analysis for CHM13-mapped data (Note that haplogroup assignment is not available with CHM13) - Fix bug with assigning haplogroups to smn2 haplotypes
- Rename
smn2_del78haplotypes tosmn_del78
- Enable
hba- Do not consider homology haplotypes during allele phasing
- Better handle
hbawith targeted data, fixing problems with identifying homology haplotypes - Update genotype calls based on phased alleles
- Report genome coordinates for
3p7and4p2SVs in thesv_calledfield
- Minor update to
strcandncf1for copy number adjustment based on depth - Update
f8to reflect SV types in the haplotype name
Version 3.2.1
Summary of changes:
- Fix problem working with CRAM
- Other minor changes:
- Add
RNtag to output bam.RNstands for region name, indicating reads used to analyze one region (paralog group) - Sort some output fields in the JSON, making it consistent from run to run
- Clean up reported alleles, filtering out cases where all haplotypes are linked into one allele, or not all haplotypes are included when two alleles are reported
- Fix getting sample ID from bam header when there are blank spaces
- Fix logic for writing homozygous haplotypes in VCF (no haplotypes phased -> no haplotypes phased and no heterozygous variant sites)
- Update two_cp_haplotypes when adjusting total_cn from 2 to 4 based on depth
- Use all reads instead of unique reads for variant calling at edges of clipped haplotypes
- Add
Version 3.2.0
Summary of changes:
- Updates to better handle targeted data
- Filter reads on rq (>=0.99), if rq is present in input bam
- Add a
--targetedoption for targeted data to drop the assumption of uniform coverage across the genome - Add two optional parameters for targeted data
--min-read-variant: Partially controls the number of supporting reads for a variant for identifying variants used for phasing. The cutoff for variant-supporting reads is determined by min(this number, max(5, depth*0.11)). Default is 20. At standard WGS depth, the default value is overwritten by max(5, depth*0.11).- Use cases: 1) Set this number low for low-coverage data or to increase sensitivity. 2) For targeted data with high coverage, set this number relatively high to avoid picking up sequencing errors and to reduce run time.
--min-read-haplotype: Minimum number of unique supporting reads for a haplotype. Default is 4. For targeted data with high coverage, this cutoff can be increased to reduce errors and to reduce run time.
- Updates to target regions:
- Update coordinates of some target regions to include full genes whenever possible:
pms2,ikbkg,hba,DDT,MBD3L2,DEFA1,PRY,CHRNA7,DHX40,GOLGA8A,IQCK,NXF2,OTOA,PDPK1,POTEI,RGPD1,RGPD3,RSPH10B,SIK1,TMLHE,CBS,KCNE1,CASTOR2,NBPF4,RGPD5,GOLGA8N,POTEB,ANKRD20A1,NSF - Add TNXB as a region on its own so that the full gene can be genotyped (the RCCX region only includes part of TNXB)
- Algorithmic changes
- Improve fusion calling in cases of homozygous deletion
- Add some homozygous sites to cover target regions evenly during phasing to improve read assignment to haplotypes and variant calling
- Update a few gene-specific callers
hba: Add calling of 4.2 deletion/duplicationsmn1: If homozygous throughout region, default to CN =2 instead of 1; Drop carrier call if only one SMN1 haplotype is found but the total CN of SERF1A/B (neighboring locus) is larger than the total CN of SMN1/2ikbkg: Improve calling of the 11.7kb deletion; Update the config to genotype the entire genencf1: Drop carrier call if only one NCF1 haplotype is found but the total CN of GTF2I (neighboring locus) is larger than the total CN of NCF1 familyrccx: Better handle homozygous deletion casespms2: Update the config to genotype the entire gene
- Other changes:
- Support cram as input
- Standardize haplotype naming across regions:
{gene name}_{haplotype name}
Version 3.1.2
Summary of changes:
- Add
--write-nocalls-in-vcfoption to write no-call sites in the VCF
Version 3.1.1
Summary of changes:
Minor update. Fix program error in low-depth or no-data regions. Completes analysis even when the input is a small bamlet (result is still a no-call).