4.1.6.0
Download release: gatk-4.1.6.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.1.6.0 release:
-
Funcotator
now supports ENSEMBL GTF files (and non-human species) -
A beta port of the GATK3 tool
DepthOfCoverage
, a tool to assess sequence coverage by a wide array of metrics, partitioned by sample, read group, library, or gene (#5913) -
Several important bug fixes and enhancements to
HaplotypeCaller
andMutect2
, including:- A fix for an often-reported issue where
HaplotypeCaller
could produce reads starting with deletions during the realignment step and error out. - A fix for another often-reported issue where
Mutect2
could emit MNPs despite--max-mnp-distance
being 0, causing downstream errors inGenomicsDB
about MNPs not being supported.
- A fix for an often-reported issue where
Full list of changes:
-
New Tools
- A beta port of the GATK3 tool
DepthOfCoverage
, a tool to assess sequence coverage by a wide array of metrics, partitioned by sample, read group, library, or gene (#5913)- This port fixes several bugs and changes some behavior present in the GATK3 version:
- Fixed a longstanding bug in GATK3 DepthOfCoverage where using multiple partition types results in column header and body lines having mismatching ordering causing incorrect output.
- The old version used to merge adjacent and overlapping intervals when generating interval summary files. This is no longer the case as in GATK4 adjacent and overlapping intervals are tabulated as separate lines in the output (This also applies to gene lists which would previously have been merged as well).
- Changed the behavior of gene list coverage to no longer count introns when generating interval summaries for gene lists.
- Added support for RefSeqGeneList files as optional gene list input.
- This port fixes several bugs and changes some behavior present in the GATK3 version:
- A beta port of the GATK3 tool
-
HaplotypeCaller
- Fixed a bug where single-base intervals led to no calls (#6507)
- This fixes the issue reported in #6495 "HaplotypeCaller doesn't detect alternate alleles with 1 bp intervals"
- Clean leading deletions from reads realigned to best haplotypes (#6498)
- This fixes the issue reported in #6490 "HaplotypeCaller might be producing bogus reads with deletions at their alignment start during realignment to best haplotype step"
- Fixed an edge case when haplotypes have leading insertion after trimming (#6518)
- Fixed a bug where single-base intervals led to no calls (#6507)
-
Mutect2
Mutect2
can now filter MNVs with orientation bias (#6486)- Added an experimental pileup-based read error corrector, which in our evaluations reduces false positives and improves speed at no cost to sensitivity (#6470)
- Switched CigarBuilder's order for adjacent indels to be deletion first (#6510)
- Fixes #6473 "Mutect2 (GATK 4.1.5.0) emitting MNPs despite max-mnp-distance 0"
- This also resolves downstream errors in
GenomicsDB
about not supporting MNPs
- Fixed several bugs involving
getReadCoordinateForReferenceCoordinate()
(#6485)- Fixes #6342 "Mutect2 occasionally writes nonsense / invalid values for MPOS info tag"
- Fixes #6314 "GATK4.1.3.0 Mutect2 enable-all-annotations option error"
- Fixes #6294 "ReadPosRankSumTest with leading insertions"
- Fixes #5492 "ReadPosRankSumTest doesn't work for two deletions with one base in between"
-
Funcotator
Funcotator
now supports ENSEMBL GTF files (and non-human species) (#6477) (#6492)- Users can now create datasources for any species for which ENSEMBL has an annotated GTF file and the corresponding coding region FASTA file
- When creating new data sources, the user must still use
gencode
as the parent folder for the GTF data source subfolders. For example, for E. coli MG1655:- DATASOURCES
- gencode
- ASM584v2
- Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.44.gtf
- Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.cds.all.fa
- …
- gencode.config
- ASM584v2
- gencode
- DATASOURCES
- For more information on creating data sources see the Funcotator tutorial on the GATK Forums.
- An example datasource for E. coli MG1655 can be found in the large test files for Funcotator
- For ENSEMBL datasources for vertebrates: ftp://ftp.ensembl.org/pub/
- For ENSEMBL datasources for other species: ftp://ftp.ensemblgenomes.org/pub/
-
CNV Calling
-
Miscellaneous Changes
- Simplified cigar and clipping code; added tests and fixed a few bugs including #6130 (#6403)
- Refactored and enhanced ArgumentsBuilder (#6474)
- Allow all GATKSparkTools to set the SBI index granularity (#6458)
- Delete NioBam and related classes (#6479)
- Clean up old interval code (#6465)
- Remove duplicate copy of the NIO prefetching code (#6464)
- Fix ignored test in GATKReadAdaptersUnitTest (#6471)
- Fix alternate spellings of De Bruijn in the codebase (#6472)
-
Documentation
- Fix a broken set of javadoc references in FeatureDataSource (#6478)