Unsupervised Learning of Video Representations using LSTMs

Srivastava, Nitish; Mansimov, Elman; Ruslan, Salakhutdinov,

doi:10.48550/arxiv.1502.04681

articlearXiv (Cornell University)Feb 16, 2015GREEN OA

Unsupervised Learning of Video Representations using LSTMs

NSNitish Srivastava EMElman MansimovSRSalakhutdinov, Ruslan

University of Toronto

Indexed inarxivdatacite

Abstract

We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds of input sequences - patches of image pixels and high-level representations ("percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output. We analyze the outputs of the model qualitatively to…

Citation impact

1,664

total citations

FWCI: —
Percentile: —
References: 30

Citations per year

Authors

3

NS
Nitish SrivastavaCorresponding
University of Toronto
EM
Elman Mansimov
University of Toronto
SR
Salakhutdinov, Ruslan
University of Toronto

Topics & keywords

Topics

Keywords

Computer science
Encoder
Artificial intelligence
Representation (politics)
Sequence (biology)
Pattern recognition (psychology)
Feature learning
Convolutional neural network

No related works found for this paper.