articleNucleic Acids ResearchJul 26, 2008GOLD OA

Substantial biases in ultra-short read data sets from high-throughput DNA sequencing

Max Planck Institute for Molecular Genetics · University of Regensburg · +1 more institution

PubMed
Indexed incrossrefdatacitedoajpubmed

Abstract

Novel sequencing technologies permit the rapid production of large sequence data sets. These technologies are likely to revolutionize genetics and biomedical research, but a thorough characterization of the ultra-short read output is necessary. We generated and analyzed two Illumina 1G ultra-short read data sets, i.e. 2.8 million 27mer reads from a Beta vulgaris genomic clone and 12.3 million 36mers from the Helicobacter acinonychis genome. We found that error rates range from 0.3% at the beginning of reads to 3.8% at the end of reads. Wrong base calls are frequently preceded by base G. Base substitution error frequencies vary by 10- to 11-fold, with A > C transversion being among the most frequent and C > G…

No related works found for this paper.

Funding