ADADELTA: An Adaptive Learning Rate Method

Zeiler, Matthew D.

doi:10.48550/arxiv.1212.5701

preprintarXiv (Cornell University)Dec 22, 2012GREEN OA

ADADELTA: An Adaptive Learning Rate Method

MDMatthew D. Zeiler

Indexed inarxivdatacite

Abstract

We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.

Citation impact

5,526

total citations

FWCI: —
Percentile: —
References: 6

Citations per year

Authors

1

MD
Matthew D. ZeilerCorresponding

Topics & keywords

Topics

Keywords

MNIST database
Stochastic gradient descent
Computer science
Overhead (engineering)
Artificial intelligence
Gradient descent
Hyperparameter
Dimension (graph theory)

No related works found for this paper.