A high-performance computing toolset for relatedness and principal component analysis of SNP data
Indexed incrossrefdoajpubmed
Abstract
Abstract Summary: Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed gdsfmt and SNPRelate (R packages for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized. Benchmarks show the uniprocessor implementations of PCA and identity-by-descent are ∼8–50 times faster than the implementations provided in the popular EIGENSTRAT (v3.0) and PLINK (v1.07) programs, respectively, and can be…
Citation impact
2,743
total citations
- FWCI
- 14.92
- Percentile
- 100%
- References
- 14
Citations per year
Authors
6Topics & keywords
Topics
Keywords
- Principal component analysis
- Computer science
- Uniprocessor system
- Implementation
- Computation
- Component (thermodynamics)
- Data mining
- Parallel computing
No related works found for this paper.