Convolutional Two-Stream Network Fusion for Video Action Recognition

Feichtenhofer, Christoph; Pinz, Axel; Zisserman, Andrew

doi:10.1109/cvpr.2016.213

articleJun 1, 2016Closed access

Convolutional Two-Stream Network Fusion for Video Action Recognition

CFChristoph Feichtenhofer APAxel Pinz AZAndrew Zisserman

Graz University of Technology · University of Oxford

Indexed incrossref

Abstract

Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. We study a number of ways of fusing ConvNet towers both spatially and temporally in order to best take advantage of this spatio-temporal information. We make the following findings: (i) that rather than fusing at the softmax layer, a spatial and temporal network can be fused at a convolution layer without loss of performance, but with a substantial saving in parameters, (ii) that it is better to fuse such networks spatially at the last convolutional layer than earlier, and that additionally fusing at the class…

Citation impact

2,760

total citations

FWCI: 151.89
Percentile: 100%
References: 49

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Softmax function
Computer science
Fuse (electrical)
Convolutional neural network
Pooling
Artificial intelligence
Convolution (computer science)
Action recognition

UN Sustainable Development Goals

Sustainable cities and communities

No related works found for this paper.