Two-Stream Convolutional Networks for Action Recognition in Videos

Simonyan, Karen; Zisserman, Andrew

doi:10.48550/arxiv.1406.2199

preprintarXiv (Cornell University)Jun 9, 2014GREEN OA

Two-Stream Convolutional Networks for Action Recognition in Videos

KSKaren Simonyan AZAndrew Zisserman

University of Oxford

Indexed inarxivdatacite

Abstract

We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework. Our contribution is three-fold. First, we propose a two-stream ConvNet architecture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multi-task learning, applied to two different action…

Citation impact

5,366

total citations

FWCI: —
Percentile: —
References: 30

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Action recognition
Computer science
Action (physics)
Convolutional neural network
Artificial intelligence
Pattern recognition (psychology)
Physics

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.