Unsupervised Learning of Visual Representations Using Videos

Wang, Xiaolong; Gupta, Abhinav

doi:10.1109/iccv.2015.320

articleDec 1, 2015Closed access

Unsupervised Learning of Visual Representations Using Videos

XWXiaolong Wang AGAbhinav Gupta

Carnegie Mellon University

Indexed incrossref

Abstract

Is strong supervision necessary for learning a good visual representation? Do we really need millions of semantically-labeled images to train a Convolutional Neural Network (CNN)? In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN. Specifically, we use hundreds of thousands of unlabeled videos from the web to learn visual representations. Our key idea is that visual tracking provides the supervision. That is, two patches connected by a track should have similar visual representation in deep feature space since they probably belong to same object or object part. We design a Siamese-triplet network with a ranking loss function to train this CNN representation.…

Citation impact

963

total citations

FWCI: 44.73
Percentile: 100%
References: 76

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Convolutional neural network
Unsupervised learning
Representation (politics)
Feature learning
Pattern recognition (psychology)
Bounding overwatch

No related works found for this paper.