articlearXiv (Cornell University)Feb 16, 2015GREEN OA

Unsupervised Learning of Video Representations using LSTMs

University of Toronto

Indexed inarxivdatacite

Abstract

We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds of input sequences - patches of image pixels and high-level representations ("percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output. We analyze the outputs of the model qualitatively to…

Citation impact

1,664
total citations
FWCI
Percentile
References
30
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Encoder
  • Artificial intelligence
  • Representation (politics)
  • Sequence (biology)
  • Pattern recognition (psychology)
  • Feature learning
  • Convolutional neural network
No related works found for this paper.