Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks
Hong Kong University of Science and Technology · Lenovo (China) · +1 more institution
Abstract
Human actions in video sequences are three-dimensional (3D) spatio-temporal signals characterizing both the visual appearance and motion dynamics of the involved humans and objects. Inspired by the success of convolutional neural networks (CNN) for image classification, recent attempts have been made to learn 3D CNNs for recognizing human actions in videos. However, partly due to the high complexity of training 3D convolution kernels and the need for large quantities of training videos, only limited success has been reported. This has triggered us to investigate in this paper a new deep architecture which can handle 3D signals more effectively. Specifically, we propose factorized spatio-temporal convolutional…
Citation impact
- FWCI
- 29.95
- Percentile
- 100%
- References
- 50
Authors
4Topics & keywords
- Action recognition
- Computer science
- Convolutional neural network
- Action (physics)
- Artificial intelligence
- Pattern recognition (psychology)
- Physics