Anatomy of high-performance matrix multiplication
The University of Texas at Austin
Indexed incrossref
Abstract
We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.
Citation impact
756
total citations
- FWCI
- 36.59
- Percentile
- 100%
- References
- 21
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Computer science
- Matrix multiplication
- Multiplication (music)
- Simple (philosophy)
- Selection (genetic algorithm)
- Implementation
- Matrix (chemical analysis)
- Parallel computing
No related works found for this paper.