On optimization methods for deep learning

Ngiam, Jiquan; Coates, Adam; Lahiri, Ahbik; Prochnow, Bobby; Le, Quoc V.; Ng, Andrew Y.

articleInternational Conference on Machine LearningJun 28, 2011Closed access

On optimization methods for deep learning

JNJiquan Ngiam ACAdam Coates ALAhbik Lahiri BPBobby Prochnow QVQuoc V. Le

Abstract

The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with line search can significantly simplify and speed up the process of pretraining deep algorithms. In our experiments, the difference between L-BFGS/CG and SGDs are more pronounced if we consider algorithmic extensions (e.g., sparsity regularization) and hardware…

Citation impact

888

total citations

FWCI: 51.12
Percentile: 100%
References: 42

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

MNIST database
Broyden–Fletcher–Goldfarb–Shanno algorithm
Computer science
Stochastic gradient descent
Deep learning
Speedup
Artificial intelligence
Conjugate gradient method

No related works found for this paper.