articleGenome ResearchJul 19, 2010BRONZE OA

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Broad Institute · Massachusetts General Hospital

PubMed
Indexed incrossrefpubmed

Abstract

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the…

Citation impact

29,833
total citations
FWCI
Percentile
References
27
Citations per year

Authors

11

Topics & keywords

Keywords
  • Computer science
  • Genome
  • Correctness
  • 1000 Genomes Project
  • Set (abstract data type)
  • DNA sequencing
  • Biology
  • Computational biology
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.

Funding