Skip to content

4.1.7.0

Compare
Choose a tag to compare
@droazen droazen released this 23 Apr 23:16
· 596 commits to master since this release
4ec2a4b

Download release: gatk-4.1.7.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/

Highlights of the 4.1.7.0 release:

  • Added allele-specific filtering to the mitochondrial pipeline.

    • Allele-specific filtering is important for mitochondrial calling because there are many more multi-allelic sites than in the germline autosome.
  • A fix for the frequently-encountered "Smith-Waterman alignment failure" error in HaplotypeCaller and Mutect2

  • Initial support for http(s) paths for BAM inputs, including signed urls

  • A new tool, DownsampleByDuplicateSet, to randomly sample a fraction of duplicate sets from an input bam sorted by UMI

Full list of changes:

  • New Tools

    • DownsampleByDuplicateSet: a new tool to randomly sample a fraction of an input bam sorted by UMI. (#6512)
      • Given a bam grouped by unique molecular identifier (UMI), this tool drops a specified fraction of duplicate sets and returns a new bam.
      • A duplicate set refers to a group of reads whose fragments start and end at the same genomic coordinate and share the same UMI.
      • The input bam must first be sorted by UMI using FGBio GroupReadsByUmi.
      • Use this tool to create, for instance, an insilico mixture of duplex-sequenced samples to simulate tumor subclones.
  • HaplotypeCaller/Mutect2

    • Fixed a regression in HaplotypeCaller and Mutect2 where alt haplotypes with a deletion at the end of the padded region caused exceptions (#6544)
      • This bug produced error messages like the following: "Smith-Waterman alignment failure. Cigar = 275M with reference length 275 but expecting reference length of 303"
    • Fixed an ArrayIndexOutOfBoundsException in GenotypeUtils.computeDiploidGenotypeCounts() caused by mistakenly assuming ploidy two for no-calls (#6563)
    • Added more control over scattering in the Mutect2 PON WDL to allow arbitrarily fine scattering, reducing the memory required for downstream runs of GenomicsDBImport (#6527)
    • Invert --correct-overlapping-quality argument in HaplotypeCaller to --do-not-correct-overlapping-quality (#6528)
  • Mitochondrial Pipeline

    • Added allele-specific filtering to the mitochondrial pipeline (#6399)
      • Allele-specific filtering is important for mitochondria because there are many more multi-allelic sites than in the germline autosome and therefore, downstream tools have access to more of the good allele data.
      • These Mutect2 filters used in the MT pipeline are now allele-specific: weak_evidence, base_qual, map_qual, duplicate, strand_bias, strand_artifact, position, contamination, and low_allele_frac.
      • They are added to the AS_FilterStatus annotation in the INFO field.
      • The numt_chimera and numt_novel filters have been replaced by the possible_numt filter.
      • Two new filtering tools have been added: NuMTFilterTool for the possible_numt filter and MTLowHeteroplasmyFilterTool for the mt_many_low_hets filter, both of which are allele-specific.
      • The --split-multi-allelics option of the LeftAlignAndTrimVariants tool now splits the annotations in the FORMAT and INFO fields that are of type A and R (allele-specific, and allele-specific with reference).
      • The VariantFiltration tool now has an --apply-allele-specific-filters option that will apply masks at the allele level. Before this addition, sites that should not be masked, but had deletions that spanned a masked site would have been masked. Now, if this option is specified, only the alleles spanning the masked site will be masked.
  • GATK Engine

    • Added initial support for http(s) paths for BAM inputs, including signed urls (#6526)
  • Miscellaneous Changes

    • Exposed maximum copy ratio and point size for CNV plotting tools (#6482)
    • Decreased an epsilon value in VariantRecalibrator so that our production exome joint genotyping tests pass (#6534)
    • Migrated reference arguments and downstream code to GATKPathSpecifier (#6524)
    • Removed obsolete isCompatibleWithSparkBroadcast() method. (#6523)
  • Documentation

    • Cleaned up the handling of some missing values in auto-generated GATK tool documentation (#6565)
      • Now docs won't include null, "", or [] in the default value list.
    • Added a README for the CNN variant scoring workflow, and added an input JSON for Mutect2 workflow files located in GCS buckets (#6542)
    • Fixed a typo in a ploidy prior example in the docs for DetermineGermlineContigPloidy (#6531)