Identifying and removing haplotypic duplication in primary genome assemblies
Harbin Institute of Technology · University of Cambridge · +1 more institution
Abstract
MOTIVATION: Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. RESULTS: Here we present a novel tool, purge_dups, that uses sequence similarity…
Citation impact
- FWCI
- 93.94
- Percentile
- 100%
- References
- 11
Authors
6Topics & keywords
- Computer science
- Purge
- Source code
- Sequence assembly
- Contiguity
- Annotation
- Genome
- Gene duplication