Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation
National Institutes of Health · National Human Genome Research Institute · +3 more institutions
Abstract
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 73
Authors
6- SKSergey KorenCorresponding
National Institutes of Health, National Human Genome Research Institute
- BPBrian P. Walenz
National Institutes of Health, National Human Genome Research Institute
- KBKonstantin Berlin
Inova Fairfax Hospital
- JMJason Miller
J. Craig Venter Institute
- NHNicholas H. Bergman
Chemring Countermeasures (United Kingdom)
Topics & keywords
- Biology
- Weighting
- Separation (statistics)
- Computational biology
- Genetics
- Computer science
- Machine learning
- Physics
- Life below water
Funding
- NSNational Science FoundationAwards: NSF IOS-1237993, IOS-1237993, 1237993
- UDU.S. Department of Homeland SecurityAward: HSHQDC-07-C-00020
- BBattelle
- NINational Institutes of HealthAward: HSHQDC-07-C-00020
- SAScience and Technology DirectorateAward: HSHQDC-07-C-00020
- NHNational Human Genome Research Institute
- DODivision of Integrative Organismal Systems