Spatiotemporal Multiplier Networks for Video Action Recognition
Graz University of Technology · Austrian Academy of Sciences · +1 more institution
Abstract
This paper presents a general ConvNet architecture for video action recognition based on multiplicative interactions of spacetime features. Our model combines the appearance and motion pathways of a two-stream architecture by motion gating and is trained end-to-end. We theoretically motivate multiplicative gating functions for residual networks and empirically study their effect on classification accuracy. To capture long-term dependencies we inject identity mapping kernels for learning temporal relationships. Our architecture is fully convolutional in spacetime and able to evaluate a video in a single forward pass. Empirical investigation reveals that our model produces state-of-the-art results on two…
Citation impact
- FWCI
- 31.97
- Percentile
- 100%
- References
- 62
Authors
3Topics & keywords
- Action recognition
- Computer science
- Artificial intelligence
- Multiplicative function
- Residual
- Pattern recognition (psychology)
- Architecture
- Spacetime
- Sustainable cities and communities