preprintarXiv (Cornell University)Jul 17, 2013GREEN OA

Integrating sequencing datasets to form highly confident SNP and indel genotype calls for a whole human genome

Indexed indatacite

Abstract

Clinical adoption of human genome sequencing requires methods with known accuracy of genotype calls at millions or billions of positions across a genome. Previous work showing discordance amongst sequencing methods and algorithms has made clear the need for a highly accurate set of genotypes across a whole genome that could be used as a benchmark. We present methods to make highly confident SNP, indel, and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias towards any method by integrating and arbitrating between 14 datasets from 5 sequencing technologies, 7 mappers, and 3 variant callers. Regions for which no confident genotype call could…

Citation impact

552
total citations
FWCI
Percentile
References
0
Citations per year

Authors

7

Topics & keywords

Keywords
  • Indel
  • Benchmarking
  • Genome
  • Genotype
  • Human genome
  • Computational biology
  • Benchmark (surveying)
  • Genomics
UN Sustainable Development Goals
  • Partnerships for the goals
No related works found for this paper.