articleGenome ResearchDec 7, 2011BRONZE OA

Efficient de novo assembly of large genomes using compressed data structures

Wellcome Sanger Institute

PubMed
Indexed incrossrefpubmed

Abstract

De novo genome sequence assembly is important both to generate new sequence assemblies for previously uncharacterized genomes and to identify the genome sequence of individuals in a reference-unbiased way. We present memory efficient data structures and algorithms for assembly using the FM-index derived from the compressed Burrows-Wheeler transform, and a new assembler based on these called SGA (String Graph Assembler). We describe algorithms to error-correct, assemble, and scaffold large sets of sequence data. SGA uses the overlap-based string graph model of assembly, unlike most de novo assemblers that rely on de Bruijn graphs, and is simply parallelizable. We demonstrate the error correction and assembly…

Citation impact

816
total citations
FWCI
Percentile
References
30
Citations per year

Authors

2

Topics & keywords

Keywords
  • Contig
  • Sequence assembly
  • k-mer
  • De Bruijn sequence
  • Genome
  • Hybrid genome assembly
  • Biology
  • Sequence (biology)
No related works found for this paper.

Funding