Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
Max Planck Institute for Molecular Genetics · University of Regensburg · +1 more institution
Abstract
Novel sequencing technologies permit the rapid production of large sequence data sets. These technologies are likely to revolutionize genetics and biomedical research, but a thorough characterization of the ultra-short read output is necessary. We generated and analyzed two Illumina 1G ultra-short read data sets, i.e. 2.8 million 27mer reads from a Beta vulgaris genomic clone and 12.3 million 36mers from the Helicobacter acinonychis genome. We found that error rates range from 0.3% at the beginning of reads to 3.8% at the end of reads. Wrong base calls are frequently preceded by base G. Base substitution error frequencies vary by 10- to 11-fold, with A > C transversion being among the most frequent and C > G…
Citation impact
- FWCI
- 38.46
- Percentile
- 100%
- References
- 22
Authors
4Topics & keywords
- Biology
- Transversion
- Genetics
- DNA sequencing
- Deep sequencing
- Computational biology
- Illumina dye sequencing
- Hybrid genome assembly