KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies
Norwich Research Park · Earlham Institute
Abstract
Motivation: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. Results: We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to…
Citation impact
- FWCI
- 14.10
- Percentile
- 100%
- References
- 10
Authors
5- DMDaniel Mapleson
Norwich Research Park, Earlham Institute
- GGGonzalo Garcia Accinelli
Norwich Research Park, Earlham Institute
- GKGeorge Kettleborough
Norwich Research Park, Earlham Institute
- JWJonathan Wright
Norwich Research Park, Earlham Institute
- BCBernardo ClavijoCorresponding
Norwich Research Park, Earlham Institute
Topics & keywords
- Computer science
- Sequence assembly
- Reference genome
- Software
- Pairwise comparison
- Data mining
- Genome
- MIT License