articleDec 1, 2015Closed access

Unsupervised Learning of Visual Representations Using Videos

Carnegie Mellon University

Indexed incrossref

Abstract

Is strong supervision necessary for learning a good visual representation? Do we really need millions of semantically-labeled images to train a Convolutional Neural Network (CNN)? In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN. Specifically, we use hundreds of thousands of unlabeled videos from the web to learn visual representations. Our key idea is that visual tracking provides the supervision. That is, two patches connected by a track should have similar visual representation in deep feature space since they probably belong to same object or object part. We design a Siamese-triplet network with a ranking loss function to train this CNN representation.…

Citation impact

963
total citations
FWCI
44.73
Percentile
100%
References
76
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Convolutional neural network
  • Unsupervised learning
  • Representation (politics)
  • Feature learning
  • Pattern recognition (psychology)
  • Bounding overwatch
No related works found for this paper.