articleOct 1, 2017Closed access

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

University of Science and Technology of China · Microsoft Research Asia (China)

Indexed incrossref

Abstract

Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3 x 3 x 3 convolutions with 1 × 3 × 3…

Citation impact

1,811
total citations
FWCI
49.63
Percentile
100%
References
47
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Convolutional neural network
  • Artificial intelligence
  • Residual
  • Bottleneck
  • Deep learning
  • Pattern recognition (psychology)
  • Representation (politics)
UN Sustainable Development Goals
  • Sustainable cities and communities
No related works found for this paper.