articleGenome ResearchApr 16, 2015BRONZE OA

An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data

University of Michigan · The University of Texas Health Science Center at Houston · +1 more institution

PubMed
Indexed incrossrefpubmed

Abstract

The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes…

No related works found for this paper.

Funding