Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Indexed inarxivdatacite
Abstract
Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make this scheme efficient, the per-worker workload must be large, which implies nontrivial growth in the SGD minibatch size. In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization. Specifically, we show no loss of accuracy when training with large…
Citation impact
2,619
total citations
- FWCI
- —
- Percentile
- —
- References
- 33
Citations per year
Authors
9Topics & keywords
Topics
Keywords
- Training (meteorology)
- Computer science
- Artificial intelligence
- Geography
- Meteorology
No related works found for this paper.