preprintarXiv (Cornell University)Jun 8, 2017GREEN OA

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Indexed inarxivdatacite

Abstract

Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make this scheme efficient, the per-worker workload must be large, which implies nontrivial growth in the SGD minibatch size. In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization. Specifically, we show no loss of accuracy when training with large…

Citation impact

2,619
total citations
FWCI
Percentile
References
33
Citations per year

Authors

9

Topics & keywords

Keywords
  • Training (meteorology)
  • Computer science
  • Artificial intelligence
  • Geography
  • Meteorology
No related works found for this paper.