Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Goyal, Priya; Dollár, Piotr; Girshick, Ross; Noordhuis, Pieter; Wesolowski, Lukasz; Kyrola, Aapo; Tulloch, Andrew; Jia, Yangqing; He, Kaiming

doi:10.48550/arxiv.1706.02677

preprintarXiv (Cornell University)Jun 8, 2017GREEN OA

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

PGPriya Goyal PDPiotr Dollár RGRoss Girshick PNPieter Noordhuis LWLukasz Wesolowski

Indexed inarxivdatacite

Abstract

Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make this scheme efficient, the per-worker workload must be large, which implies nontrivial growth in the SGD minibatch size. In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization. Specifically, we show no loss of accuracy when training with large…

Citation impact

2,619

total citations

FWCI: —
Percentile: —
References: 33

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Training (meteorology)
Computer science
Artificial intelligence
Geography
Meteorology

No related works found for this paper.