DINOv2: Learning Robust Visual Features without Supervision
Institut national de recherche en informatique et en automatique
Abstract
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 131
Authors
26- MOMaxime OquabCorresponding
Institut national de recherche en informatique et en automatique
- TDTimothée Darcet
Institut national de recherche en informatique et en automatique
- TMThéo Moutakanni
Institut national de recherche en informatique et en automatique
- HVHuy Vo
Institut national de recherche en informatique et en automatique
- MSMarc Szafraniec
Institut national de recherche en informatique et en automatique
Topics & keywords
- Computer science
- Pipeline (software)
- Artificial intelligence
- Machine learning
- Scale (ratio)
- Training set
- Image (mathematics)
- Quality Education