Benchmarking GPUs to tune dense linear algebra

Волков, В. М.; Demmel, James

doi:10.5555/1413370.1413402

articleNov 15, 2008Closed access

Benchmarking GPUs to tune dense linear algebra

ВМВ. М. Волков JDJames Demmel

University of California, Berkeley

Abstract

We present performance results for dense linear algebra using recent NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs up to 60 % faster than the vendor’s implementation and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80–90 % of the peak GEMM rate. Our parallel LU running on two GPUs achieves up to ~540 Gflop/s. These results are accomplished by challenging the accepted view of the GPU architecture and programming guidelines. We argue that modern GPUs should be viewed as multithreaded multicore vector units. We exploit blocking similarly to vector computers and heterogeneity of the system by computing both on GPU and CPU. This study includes…

Citation impact

726

total citations

FWCI: 97.39
Percentile: 100%
References: 18

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Parallel computing
Computer science
Linear algebra
Benchmarking
Cholesky decomposition
CUDA
Multi-core processor
Kernel (algebra)

No related works found for this paper.