articleInternational Conference on Machine LearningJun 28, 2011Closed access

On optimization methods for deep learning

Stanford University

Abstract

The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with line search can significantly simplify and speed up the process of pretraining deep algorithms. In our experiments, the difference between L-BFGS/CG and SGDs are more pronounced if we consider algorithmic extensions (e.g., sparsity regularization) and hardware…

Citation impact

888
total citations
FWCI
51.12
Percentile
100%
References
42
Citations per year

Authors

6

Topics & keywords

Keywords
  • MNIST database
  • Broyden–Fletcher–Goldfarb–Shanno algorithm
  • Computer science
  • Stochastic gradient descent
  • Deep learning
  • Speedup
  • Artificial intelligence
  • Conjugate gradient method
No related works found for this paper.