Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Assran, Mahmoud; Duval, Quentin; Misra, Ishan; Bojanowski, Piotr; Vincent, P.; Rabbat, Michael; LeCun, Yann; Ballas, Nicolas

doi:10.1109/cvpr52729.2023.01499

articleJun 1, 2023Closed access

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

MAMahmoud Assran QDQuentin Duval IMIshan Misra PBPiotr Bojanowski PVP. Vincent

Mila - Quebec Artificial Intelligence Institute · McGill University · +1 more institution

Indexed incrossref

Abstract

This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically,…

Citation impact

280

total citations

FWCI: 46.32
Percentile: 100%
References: 111

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Embedding
Artificial intelligence
Block (permutation group theory)
Scalability
Machine learning
Feature learning
Pattern recognition (psychology)

No related works found for this paper.