SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data
BGI Group (China) · Fujian Medical University · +5 more institutions
Abstract
Quality control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures, and highly scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a "QC-Preprocess-QC" workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments, respectively. As a workflow-like tool, SOAPnuke centralizes processing functions into 1 executable and predefines their…
Citation impact
- FWCI
- 43.95
- Percentile
- 100%
- References
- 41
Authors
15Topics & keywords
- Computer science
- Scalability
- Workflow
- Preprocessor
- Executable
- Benchmarking
- Data mining
- Throughput