Early versus late fusion in semantic video analysis

Snoek, Cees G. M.; Worring, Marcel; Smeulders, A.W.M.

doi:10.1145/1101149.1101236

articleNov 6, 2005GREEN OA

Early versus late fusion in semantic video analysis

CGCees G. M. Snoek MWMarcel Worring ASA.W.M. Smeulders

University of Amsterdam

Indexed incrossref

Abstract

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, we consider two classes of fusion schemes, namely early fusion and late fusion. The former fuses modalities in feature space, the latter fuses modalities in semantic space. We show by experiment on 184 hours of broadcast video data and for 20 semantic concepts, that late fusion tends to give slightly better performance for most concepts. However, for those concepts where early fusion performs better the difference is more significant.

Citation impact

854

total citations

FWCI: 19.44
Percentile: 100%
References: 10

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Modalities
Fusion
Modality (human–computer interaction)
Point (geometry)
Sensor fusion
Semantic analysis (machine learning)
Index (typography)

No related works found for this paper.