articleSep 6, 2015Closed access
Scalable distributed DNN training using commodity GPU cloud computing
Indexed incrossref
Abstract
We introduce a new method for scaling up distributed Stochastic Gradient Descent (SGD) training of Deep Neural Networks (DNN). The method solves the well-known communication bottleneck problem that arises for data-parallel SGD because compute nodes frequently need to synchronize a replica of the model. We solve it by purposefully controlling the rate of weight-update per individual weight, which is in contrast to the uniform update-rate customarily imposed by the size of a mini-batch. It is shown empirically that the method can reduce the amount of communication by three orders of magnitude while training a typical DNN for acoustic modelling. This reduction in communication bandwidth enables efficient scaling…
Citation impact
547
total citations
- FWCI
- 28.50
- Percentile
- 100%
- References
- 23
Citations per year
Authors
1Topics & keywords
Topics
Keywords
- Computer science
- Cloud computing
- Scalability
- Commodity
- Training (meteorology)
- Distributed computing
- Parallel computing
- Operating system
UN Sustainable Development Goals
- Industry, innovation and infrastructure
No related works found for this paper.