The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
Broad Institute · Massachusetts General Hospital
Abstract
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 27
Authors
11Topics & keywords
- Computer science
- Genome
- Correctness
- 1000 Genomes Project
- Set (abstract data type)
- DNA sequencing
- Biology
- Computational biology
- Industry, innovation and infrastructure