articleBioinformaticsJan 19, 2020HYBRID OA

Identifying and removing haplotypic duplication in primary genome assemblies

Harbin Institute of Technology · University of Cambridge · +1 more institution

PubMed
Indexed incrossrefdoajpubmed

Abstract

MOTIVATION: Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. RESULTS: Here we present a novel tool, purge_dups, that uses sequence similarity…

Citation impact

2,791
total citations
FWCI
93.94
Percentile
100%
References
11
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Purge
  • Source code
  • Sequence assembly
  • Contiguity
  • Annotation
  • Genome
  • Gene duplication
No related works found for this paper.

Funding