cuDNN: Efficient Primitives for Deep Learning
Indexed inarxivdatacite
Abstract
We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized, which makes maintaining codebases difficult over time. Similar issues have long been addressed in the HPC community by libraries such as the Basic Linear Algebra Subroutines (BLAS). However, there is no analogous library for deep learning. Without such a library, researchers implementing deep learning workloads on parallel processors must create and optimize their own implementations of the main computational kernels, and this work must be repeated as new…
Citation impact
1,028
total citations
- FWCI
- —
- Percentile
- —
- References
- 16
Citations per year
Authors
7Topics & keywords
Topics
Keywords
- Computer science
- Deep learning
- Implementation
- Parallel computing
- Subroutine
- Artificial intelligence
- Computer architecture
- Programming language
No related works found for this paper.