Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
University of Science and Technology of China · Microsoft Research Asia (China)
Abstract
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3 x 3 x 3 convolutions with 1 × 3 × 3…
Citation impact
- FWCI
- 49.63
- Percentile
- 100%
- References
- 47
Authors
3Topics & keywords
- Computer science
- Convolutional neural network
- Artificial intelligence
- Residual
- Bottleneck
- Deep learning
- Pattern recognition (psychology)
- Representation (politics)
- Sustainable cities and communities