ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Woo, Sanghyun; Debnath, Shoubhik; Hu, Ronghang; Chen, Xinlei; Liu, Zhuang; Kweon, In So; Xie, Saining

doi:10.1109/cvpr52729.2023.01548

articleJun 1, 2023Closed access

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

SWSanghyun Woo SDShoubhik Debnath RHRonghang Hu XCXinlei Chen ZLZhuang Liu

Korea Advanced Institute of Science and Technology · Kootenay Association for Science & Technology · +1 more institution

Indexed incrossref

Abstract

Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt [33], have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE) [14]. However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization…

Citation impact

1,285

total citations

FWCI: 212.43
Percentile: 100%
References: 68

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Machine learning
Normalization (sociology)
Feature learning
Pattern recognition (psychology)
Autoencoder
Segmentation

UN Sustainable Development Goals

Industry, innovation and infrastructure

No related works found for this paper.