cuDNN: Efficient Primitives for Deep Learning

Chetlur, Sharan; Woolley, Cliff; Vandermersch, Philippe; Cohen, Jonathan; Tran, John; Catanzaro, Bryan; Shelhamer, Evan

doi:10.48550/arxiv.1410.0759

preprintarXiv (Cornell University)Oct 3, 2014GREEN OA

cuDNN: Efficient Primitives for Deep Learning

SCSharan Chetlur CWCliff Woolley PVPhilippe Vandermersch JCJonathan Cohen JTJohn Tran

Indexed inarxivdatacite

Abstract

We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized, which makes maintaining codebases difficult over time. Similar issues have long been addressed in the HPC community by libraries such as the Basic Linear Algebra Subroutines (BLAS). However, there is no analogous library for deep learning. Without such a library, researchers implementing deep learning workloads on parallel processors must create and optimize their own implementations of the main computational kernels, and this work must be repeated as new…

Citation impact

1,028

total citations

FWCI: —
Percentile: —
References: 16

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Deep learning
Implementation
Parallel computing
Subroutine
Artificial intelligence
Computer architecture
Programming language

No related works found for this paper.