Describing Videos by Exploiting Temporal Structure

Yao, Li; Torabi, Atousa; Cho, Kyunghyun; Ballas, Nicolas; Pal, Christopher; Larochelle, Hugo; Courville, Aaron

doi:10.1109/iccv.2015.512

articleDec 1, 2015Closed access

Describing Videos by Exploiting Temporal Structure

LYLi Yao ATAtousa Torabi KCKyunghyun Cho NBNicolas Ballas CPChristopher Pal

Université de Sherbrooke · Université de Montréal · +1 more institution

Indexed incrossref

Abstract

Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description model. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition…

Citation impact

953

total citations

FWCI: 61.76
Percentile: 100%
References: 72

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Recurrent neural network
Artificial intelligence
Representation (politics)
Convolutional neural network
Context (archaeology)
Motion (physics)
Natural language

No related works found for this paper.