Improving the speed of neural networks on CPUs

Vanhoucke, Vincent; Senior, Andrew; Mao, M.

articleJan 1, 2011Closed access

Improving the speed of neural networks on CPUs

VVVincent Vanhoucke ASAndrew Senior MMM. Mao

Abstract

Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs. For this reason, GPUs are routinely used instead to train and run such networks. This paper is a tutorial for students and researchers on some of the techniques that can be used to reduce this computational cost considerably on modern x86 CPUs. We emphasize data layout, batching of the computation, the use of SSE2 instructions, and particularly leverage SSSE3 and SSE4 fixed-point instructions which provide a 3 ×…

Citation impact

677

total citations

FWCI: 18.42
Percentile: 100%
References: 10

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Speedup
Computer science
x86
Artificial neural network
Leverage (statistics)
Deep learning
Computation
Floating point

UN Sustainable Development Goals

Quality Education

No related works found for this paper.