Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Hara, Kensho; Kataoka, Hirokatsu; Satoh, Yutaka

doi:10.1109/iccvw.2017.373

articleOct 1, 2017Closed access

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

KHKensho Hara HKHirokatsu Kataoka YSYutaka Satoh

National Institute of Advanced Industrial Science and Technology

Indexed incrossref

Abstract

Convolutional neural networks with spatio-temporal 3D kernels (3D CNNs) have an ability to directly extract spatiotemporal features from videos for action recognition. Although the 3D kernels tend to overfit because of a large number of their parameters, the 3D CNNs are greatly improved by using recent huge video databases. However, the architecture of3D CNNs is relatively shallow against to the success of very deep neural networks in 2D-based CNNs, such as residual networks (ResNets). In this paper, we propose a 3D CNNs based on ResNets toward a better action representation. We describe the training procedure of our 3D ResNets in details. We experimentally evaluate the 3D ResNets on the ActivityNet and…

Citation impact

680

total citations

FWCI: 14.42
Percentile: 100%
References: 29

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Overfitting
Computer science
Convolutional neural network
Residual
Code (set theory)
Action recognition
Artificial intelligence
Representation (politics)

No related works found for this paper.