On optimization methods for deep learning
Abstract
The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with line search can significantly simplify and speed up the process of pretraining deep algorithms. In our experiments, the difference between L-BFGS/CG and SGDs are more pronounced if we consider algorithmic extensions (e.g., sparsity regularization) and hardware…
Citation impact
888
total citations
- FWCI
- 51.12
- Percentile
- 100%
- References
- 42
Citations per year
Authors
6Topics & keywords
Topics
Keywords
- MNIST database
- Broyden–Fletcher–Goldfarb–Shanno algorithm
- Computer science
- Stochastic gradient descent
- Deep learning
- Speedup
- Artificial intelligence
- Conjugate gradient method
No related works found for this paper.