End-to-End Learning of Visual Representations From Uncurated Instructional Videos

Miech, Antoine; Alayrac, Jean-Baptiste; Smaira, Lucas; Laptev, Ivan; Šivic, Josef; Zisserman, Andrew

doi:10.1109/cvpr42600.2020.00990

articleJun 1, 2020Closed access

End-to-End Learning of Visual Representations From Uncurated Instructional Videos

AMAntoine Miech JAJean-Baptiste Alayrac LSLucas Smaira ILIvan Laptev JŠJosef Šivic

Université Paris Sciences et Lettres · Institut national de recherche en informatique et en automatique · +6 more institutions

Indexed incrossref

Abstract

Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video models still rely on manually annotated data. With the recent introduction of the HowTo100M dataset, narrated videos now offer the possibility of learning video representations without manual supervision. In this work we propose a new learning approach, MIL-NCE, capable of addressing mis- alignments inherent in narrated videos. With this approach we are able to learn strong video representations from scratch, without the need for any manual annotation. We evaluate our representations on a wide range of four downstream tasks over eight datasets: action recognition (HMDB-51, UCF-101, Kinetics-700), text-to- video retrieval…

Citation impact

585

total citations

FWCI: 43.06
Percentile: 100%
References: 126

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Action recognition
Annotation
Scalability
Segmentation
Artificial intelligence
Action (physics)
Scratch

UN Sustainable Development Goals

Quality Education

No related works found for this paper.