EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Fang, Yuxin; Wang, Wen; Xie, Binhui; Sun, Quan; Wu, Ledell; Wang, Xinggang; Huang, Tiejun; Wang, Xinlong; Cao, Yue

doi:10.1109/cvpr52729.2023.01855

articleJun 1, 2023Closed access

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

YFYuxin Fang WWWen Wang BXBinhui Xie QSQuan Sun LWLedell Wu

Huazhong University of Science and Technology · Beijing Academy of Artificial Intelligence · +2 more institutions

Indexed incrossref

Abstract

We launch EVA, a vision-centric foundation model to Explore the limits of Visual representation at scAle using only publicly accessible data. EVA is a vanilla ViT pre-trained to reconstruct the masked out image-text aligned vision features conditioned on visible image patches. Via this pretext task, we can efficiently scale up EVA to one billion parameters, and sets new records on a broad range of representative vision downstream tasks, such as image recognition, video action recognition, object detection, instance segmentation and semantic segmentation without heavy supervised training. Moreover, we observe quantitative changes in scaling EVA result in qualitative changes in transfer learning performance that…

Citation impact

446

total citations

FWCI: 50.54
Percentile: 100%
References: 188

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Segmentation
Initialization
Object detection
Task (project management)
Image segmentation
Cognitive neuroscience of visual object recognition

UN Sustainable Development Goals

Quality Education

No related works found for this paper.