An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
University of Michigan · The University of Texas Health Science Center at Houston · +1 more institution
Abstract
The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 20
Authors
4Topics & keywords
- 1000 Genomes Project
- Exome
- Pipeline (software)
- Biology
- Exome sequencing
- Computational biology
- Scalability
- Genotyping