Unsupervised Learning of Video Representations using LSTMs
Abstract
We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds of input sequences - patches of image pixels and high-level representations ("percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output. We analyze the outputs of the model qualitatively to…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 30
Authors
3- NSNitish SrivastavaCorresponding
University of Toronto
- EMElman Mansimov
University of Toronto
- SRSalakhutdinov, Ruslan
University of Toronto
Topics & keywords
- Computer science
- Encoder
- Artificial intelligence
- Representation (politics)
- Sequence (biology)
- Pattern recognition (psychology)
- Feature learning
- Convolutional neural network