preprintarXiv (Cornell University)Oct 3, 2014GREEN OA

cuDNN: Efficient Primitives for Deep Learning

Indexed inarxivdatacite

Abstract

We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized, which makes maintaining codebases difficult over time. Similar issues have long been addressed in the HPC community by libraries such as the Basic Linear Algebra Subroutines (BLAS). However, there is no analogous library for deep learning. Without such a library, researchers implementing deep learning workloads on parallel processors must create and optimize their own implementations of the main computational kernels, and this work must be repeated as new…

Citation impact

1,028
total citations
FWCI
Percentile
References
16
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Deep learning
  • Implementation
  • Parallel computing
  • Subroutine
  • Artificial intelligence
  • Computer architecture
  • Programming language
No related works found for this paper.