articleJan 1, 2011Closed access
Improving the speed of neural networks on CPUs
Abstract
Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs. For this reason, GPUs are routinely used instead to train and run such networks. This paper is a tutorial for students and researchers on some of the techniques that can be used to reduce this computational cost considerably on modern x86 CPUs. We emphasize data layout, batching of the computation, the use of SSE2 instructions, and particularly leverage SSSE3 and SSE4 fixed-point instructions which provide a 3 ×…
Citation impact
677
total citations
- FWCI
- 18.42
- Percentile
- 100%
- References
- 10
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Speedup
- Computer science
- x86
- Artificial neural network
- Leverage (statistics)
- Deep learning
- Computation
- Floating point
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.