articleJan 1, 2011Closed access

Improving the speed of neural networks on CPUs

Abstract

Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs. For this reason, GPUs are routinely used instead to train and run such networks. This paper is a tutorial for students and researchers on some of the techniques that can be used to reduce this computational cost considerably on modern x86 CPUs. We emphasize data layout, batching of the computation, the use of SSE2 instructions, and particularly leverage SSSE3 and SSE4 fixed-point instructions which provide a 3 ×…

Citation impact

677
total citations
FWCI
18.42
Percentile
100%
References
10
Citations per year

Authors

3

Topics & keywords

Keywords
  • Speedup
  • Computer science
  • x86
  • Artificial neural network
  • Leverage (statistics)
  • Deep learning
  • Computation
  • Floating point
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.