Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences
Icahn School of Medicine at Mount Sinai · Northern Arizona University · +6 more institutions
Abstract
We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though…
Citation impact
- FWCI
- 26.46
- Percentile
- 100%
- References
- 21
Authors
16Topics & keywords
- Computer science
- Operational taxonomic unit
- Cluster analysis
- Amplicon sequencing
- Similarity (geometry)
- Amplicon
- Open source
- Software