On the importance of initialization and momentum in deep learning

Sutskever, Ilya; Martens, James; Dahl, George E.; Hinton, Geoffrey E.

articleJun 16, 2013Closed access

On the importance of initialization and momentum in deep learning

ISIlya Sutskever JMJames Martens GEGeorge E. Dahl GEGeoffrey E. Hinton

Google (United States) · University of Toronto

Abstract

Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. We find that both the initialization and the momentum are crucial since poorly initialized networks cannot be trained with momentum and well-initialized networks…

Citation impact

3,534

total citations

FWCI: 209.56
Percentile: 100%
References: 28

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Initialization
Momentum (technical analysis)
Computer science
Recurrent neural network
Gradient descent
Stochastic gradient descent
Deep learning
Artificial intelligence

No related works found for this paper.