LSTM: A Search Space Odyssey
Dalle Molle Institute for Artificial Intelligence Research · Università della Svizzera italiana
Abstract
Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed…
Citation impact
- FWCI
- 362.86
- Percentile
- 100%
- References
- 70
Authors
5- KGKlaus GreffCorresponding
Dalle Molle Institute for Artificial Intelligence Research, Università della Svizzera italiana
- RKRupesh K. Srivastava
Dalle Molle Institute for Artificial Intelligence Research, Università della Svizzera italiana
- JKJan Koutník
Dalle Molle Institute for Artificial Intelligence Research, Università della Svizzera italiana
- BRBas R. Steunebrink
Dalle Molle Institute for Artificial Intelligence Research, Università della Svizzera italiana
- JSJürgen Schmidhuber
Dalle Molle Institute for Artificial Intelligence Research, Università della Svizzera italiana
Topics & keywords
- Hyperparameter
- Computer science
- Task (project management)
- Artificial intelligence
- Recurrent neural network
- Inference
- Variety (cybernetics)
- Machine learning
- Quality Education