articleDec 1, 2015Closed access
Unsupervised Learning of Visual Representations Using Videos
Indexed incrossref
Abstract
Is strong supervision necessary for learning a good visual representation? Do we really need millions of semantically-labeled images to train a Convolutional Neural Network (CNN)? In this paper, we present a simple yet surprisingly powerful approach for unsupervised learning of CNN. Specifically, we use hundreds of thousands of unlabeled videos from the web to learn visual representations. Our key idea is that visual tracking provides the supervision. That is, two patches connected by a track should have similar visual representation in deep feature space since they probably belong to same object or object part. We design a Siamese-triplet network with a ranking loss function to train this CNN representation.…
Citation impact
963
total citations
- FWCI
- 44.73
- Percentile
- 100%
- References
- 76
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Computer science
- Artificial intelligence
- Convolutional neural network
- Unsupervised learning
- Representation (politics)
- Feature learning
- Pattern recognition (psychology)
- Bounding overwatch
No related works found for this paper.