Efficient mini-batch training for stochastic optimization

Li, Mu; Zhang, Tong; Chen, Yuqiang; Smola, Alexander J.

doi:10.1145/2623330.2623612

articleAug 22, 2014Closed access

Efficient mini-batch training for stochastic optimization

MLMu Li TZTong Zhang YCYuqiang Chen AJAlexander J. Smola

Carnegie Mellon University · Baidu (China)

Indexed incrossref

Abstract

Stochastic gradient descent (SGD) is a popular technique for large-scale optimization problems in machine learning. In order to parallelize SGD, minibatch training needs to be employed to reduce the communication cost. However, an increase in minibatch size typically decreases the rate of convergence. This paper introduces a technique based on approximate optimization of a conservatively regularized objective function within each minibatch. We prove that the convergence rate does not decrease with increasing minibatch size. Experiments demonstrate that with suitable implementations of approximate optimization, the resulting algorithm can outperform standard SGD in many scenarios.

Citation impact

767

total citations

FWCI: 41.89
Percentile: 100%
References: 38

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Stochastic gradient descent
Computer science
Convergence (economics)
Implementation
Stochastic optimization
Rate of convergence
Mathematical optimization
Optimization problem

No related works found for this paper.