4.1.7.0
Download release: gatk-4.1.7.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.1.7.0 release:
-
Added allele-specific filtering to the mitochondrial pipeline.
- Allele-specific filtering is important for mitochondrial calling because there are many more multi-allelic sites than in the germline autosome.
-
A fix for the frequently-encountered "Smith-Waterman alignment failure" error in
HaplotypeCaller
andMutect2
-
Initial support for http(s) paths for BAM inputs, including signed urls
-
A new tool,
DownsampleByDuplicateSet
, to randomly sample a fraction of duplicate sets from an input bam sorted by UMI
Full list of changes:
-
New Tools
DownsampleByDuplicateSet
: a new tool to randomly sample a fraction of an input bam sorted by UMI. (#6512)- Given a bam grouped by unique molecular identifier (UMI), this tool drops a specified fraction of duplicate sets and returns a new bam.
- A duplicate set refers to a group of reads whose fragments start and end at the same genomic coordinate and share the same UMI.
- The input bam must first be sorted by UMI using FGBio GroupReadsByUmi.
- Use this tool to create, for instance, an insilico mixture of duplex-sequenced samples to simulate tumor subclones.
-
HaplotypeCaller/Mutect2
- Fixed a regression in
HaplotypeCaller
andMutect2
where alt haplotypes with a deletion at the end of the padded region caused exceptions (#6544)- This bug produced error messages like the following: "Smith-Waterman alignment failure. Cigar = 275M with reference length 275 but expecting reference length of 303"
- Fixed an
ArrayIndexOutOfBoundsException
inGenotypeUtils.computeDiploidGenotypeCounts()
caused by mistakenly assuming ploidy two for no-calls (#6563) - Added more control over scattering in the
Mutect2
PON WDL to allow arbitrarily fine scattering, reducing the memory required for downstream runs ofGenomicsDBImport
(#6527) - Invert
--correct-overlapping-quality
argument inHaplotypeCaller
to--do-not-correct-overlapping-quality
(#6528)
- Fixed a regression in
-
Mitochondrial Pipeline
- Added allele-specific filtering to the mitochondrial pipeline (#6399)
- Allele-specific filtering is important for mitochondria because there are many more multi-allelic sites than in the germline autosome and therefore, downstream tools have access to more of the good allele data.
- These Mutect2 filters used in the MT pipeline are now allele-specific:
weak_evidence
,base_qual
,map_qual
,duplicate
,strand_bias
,strand_artifact
,position
,contamination
, andlow_allele_frac
. - They are added to the
AS_FilterStatus
annotation in the INFO field. - The
numt_chimera
andnumt_novel
filters have been replaced by thepossible_numt
filter. - Two new filtering tools have been added:
NuMTFilterTool
for thepossible_numt
filter andMTLowHeteroplasmyFilterTool
for themt_many_low_hets
filter, both of which are allele-specific. - The
--split-multi-allelics
option of theLeftAlignAndTrimVariants
tool now splits the annotations in the FORMAT and INFO fields that are of type A and R (allele-specific, and allele-specific with reference). - The
VariantFiltration
tool now has an--apply-allele-specific-filters
option that will apply masks at the allele level. Before this addition, sites that should not be masked, but had deletions that spanned a masked site would have been masked. Now, if this option is specified, only the alleles spanning the masked site will be masked.
- Added allele-specific filtering to the mitochondrial pipeline (#6399)
-
GATK Engine
- Added initial support for http(s) paths for BAM inputs, including signed urls (#6526)
-
Miscellaneous Changes
- Exposed maximum copy ratio and point size for CNV plotting tools (#6482)
- Decreased an epsilon value in
VariantRecalibrator
so that our production exome joint genotyping tests pass (#6534) - Migrated reference arguments and downstream code to
GATKPathSpecifier
(#6524) - Removed obsolete
isCompatibleWithSparkBroadcast()
method. (#6523)
-
Documentation
- Cleaned up the handling of some missing values in auto-generated GATK tool documentation (#6565)
- Now docs won't include null, "", or [] in the default value list.
- Added a README for the CNN variant scoring workflow, and added an input JSON for
Mutect2
workflow files located in GCS buckets (#6542) - Fixed a typo in a ploidy prior example in the docs for
DetermineGermlineContigPloidy
(#6531)
- Cleaned up the handling of some missing values in auto-generated GATK tool documentation (#6565)