STM: SpatioTemporal and Motion Encoding for Action Recognition
Zhejiang University · Group Sense (China)
Abstract
Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose a STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra…
Citation impact
- FWCI
- 25.72
- Percentile
- 100%
- References
- 58
Authors
5Topics & keywords
- Computer science
- ENCODE
- Encoding (memory)
- Artificial intelligence
- Motion (physics)
- Block (permutation group theory)
- Action recognition
- Computation