Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Mila - Quebec Artificial Intelligence Institute · McGill University · +1 more institution
Abstract
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically,…
Citation impact
- FWCI
- 46.32
- Percentile
- 100%
- References
- 111
Authors
8Topics & keywords
- Computer science
- Embedding
- Artificial intelligence
- Block (permutation group theory)
- Scalability
- Machine learning
- Feature learning
- Pattern recognition (psychology)