TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads
BGI Group (China) · Botswana Geoscience Institute · +3 more institutions
Abstract
Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited.
We developed a software tool to close sequence gaps in genome assemblies, TGS-GapCloser, that uses low-depth (∼10×) long single-molecule reads. The algorithm extracts reads that bridge gap regions between 2 contigs within a scaffold, error corrects only the candidate reads, and assigns the best sequence data to each gap. As a demonstration, we used TGS-GapCloser to improve the scaftig NG50 value of 3 human genome assemblies by 24-fold on average with only ∼10× coverage of Oxford Nanopore or Pacific Biosciences reads, covering with sequence data up to 94.8% gaps with 97.7% positive predictive value. These improved assemblies achieve 99.998% (Q46) single-base accuracy with final inserted sequences having 99.97% (Q35) accuracy, despite the high raw error rate of single-molecule reads, enabling high-quality downstream analyses, including up to a 31-fold increase in the scaftig NGA50 and up to 13.1% more complete BUSCO genes. Additionally, we show that even in ultra-large genome assemblies, such as the ginkgo (∼12 Gb), TGS-GapCloser can cover 71.6% of gaps with sequence data.
Citation impact
- FWCI
- 14.09
- Percentile
- 100%
- References
- 48
Authors
11- MXMengyang Xu
BGI Group (China), Botswana Geoscience Institute
- LGLidong Guo
BGI Group (China), Botswana Geoscience Institute, University of Chinese Academy of Sciences
- SGShengqiang Gu
BGI Group (China), Botswana Geoscience Institute, University of Chinese Academy of Sciences
- OWOu Wang
BGI Group (China)
- RZRui Zhang
BGI Group (China), Botswana Geoscience Institute
Topics & keywords
- Contig
- Genome
- Reference genome
- Sequence assembly
- Nanopore sequencing
- Computer science
- Computational biology
- Whole genome sequencing
- Life below water