articleOct 1, 2017Closed access

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

National Institute of Advanced Industrial Science and Technology

Indexed incrossref

Abstract

Convolutional neural networks with spatio-temporal 3D kernels (3D CNNs) have an ability to directly extract spatiotemporal features from videos for action recognition. Although the 3D kernels tend to overfit because of a large number of their parameters, the 3D CNNs are greatly improved by using recent huge video databases. However, the architecture of3D CNNs is relatively shallow against to the success of very deep neural networks in 2D-based CNNs, such as residual networks (ResNets). In this paper, we propose a 3D CNNs based on ResNets toward a better action representation. We describe the training procedure of our 3D ResNets in details. We experimentally evaluate the 3D ResNets on the ActivityNet and…

Citation impact

680
total citations
FWCI
14.42
Percentile
100%
References
29
Citations per year

Authors

3

Topics & keywords

Keywords
  • Overfitting
  • Computer science
  • Convolutional neural network
  • Residual
  • Code (set theory)
  • Action recognition
  • Artificial intelligence
  • Representation (politics)
No related works found for this paper.