articleNov 15, 2008Closed access

Benchmarking GPUs to tune dense linear algebra

University of California, Berkeley

Abstract

We present performance results for dense linear algebra using recent NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs up to 60 % faster than the vendor’s implementation and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80–90 % of the peak GEMM rate. Our parallel LU running on two GPUs achieves up to ~540 Gflop/s. These results are accomplished by challenging the accepted view of the GPU architecture and programming guidelines. We argue that modern GPUs should be viewed as multithreaded multicore vector units. We exploit blocking similarly to vector computers and heterogeneity of the system by computing both on GPU and CPU. This study includes…

Citation impact

726
total citations
FWCI
97.39
Percentile
100%
References
18
Citations per year

Authors

2

Topics & keywords

Keywords
  • Parallel computing
  • Computer science
  • Linear algebra
  • Benchmarking
  • Cholesky decomposition
  • CUDA
  • Multi-core processor
  • Kernel (algebra)
No related works found for this paper.