Substantial biases in ultra-short read data sets from high-throughput DNA sequencing

Dohm, Juliane C.; Lottaz, Claudio; Borodina, Tatiana; Himmelbauer, Heinz

doi:10.1093/nar/gkn425

articleNucleic Acids ResearchJul 26, 2008GOLD OA

Substantial biases in ultra-short read data sets from high-throughput DNA sequencing

JCJuliane C. Dohm CLClaudio Lottaz TBTatiana Borodina HHHeinz Himmelbauer

Max Planck Institute for Molecular Genetics · University of Regensburg · +1 more institution

PubMed

Indexed incrossrefdatacitedoajpubmed

Abstract

Novel sequencing technologies permit the rapid production of large sequence data sets. These technologies are likely to revolutionize genetics and biomedical research, but a thorough characterization of the ultra-short read output is necessary. We generated and analyzed two Illumina 1G ultra-short read data sets, i.e. 2.8 million 27mer reads from a Beta vulgaris genomic clone and 12.3 million 36mers from the Helicobacter acinonychis genome. We found that error rates range from 0.3% at the beginning of reads to 3.8% at the end of reads. Wrong base calls are frequently preceded by base G. Base substitution error frequencies vary by 10- to 11-fold, with A > C transversion being among the most frequent and C > G…

Citation impact

1,108

total citations

FWCI: 38.46
Percentile: 100%
References: 22

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Biology
Transversion
Genetics
DNA sequencing
Deep sequencing
Computational biology
Illumina dye sequencing
Hybrid genome assembly

No related works found for this paper.

Funding

M
Max-Planck-Gesellschaft