Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

Qiu, Zhaofan; Yao, Ting; Mei, Tao

doi:10.1109/iccv.2017.590

articleOct 1, 2017Closed access

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

ZQZhaofan Qiu TYTing Yao TMTao Mei

University of Science and Technology of China · Microsoft Research Asia (China)

Indexed incrossref

Abstract

Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3 x 3 x 3 convolutions with 1 × 3 × 3…

Citation impact

1,811

total citations

FWCI: 49.63
Percentile: 100%
References: 47

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Convolutional neural network
Artificial intelligence
Residual
Bottleneck
Deep learning
Pattern recognition (psychology)
Representation (politics)

UN Sustainable Development Goals

Sustainable cities and communities

No related works found for this paper.