articleDec 1, 2015Closed access

Describing Videos by Exploiting Temporal Structure

Université de Sherbrooke · Université de Montréal · +1 more institution

Indexed incrossref

Abstract

Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description model. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition…

Citation impact

953
total citations
FWCI
61.76
Percentile
100%
References
72
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Recurrent neural network
  • Artificial intelligence
  • Representation (politics)
  • Convolutional neural network
  • Context (archaeology)
  • Motion (physics)
  • Natural language
No related works found for this paper.

Funding