ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Korea Advanced Institute of Science and Technology · Kootenay Association for Science & Technology · +1 more institution
Abstract
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt [33], have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE) [14]. However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization…
Citation impact
- FWCI
- 212.43
- Percentile
- 100%
- References
- 68
Authors
7Topics & keywords
- Computer science
- Artificial intelligence
- Machine learning
- Normalization (sociology)
- Feature learning
- Pattern recognition (psychology)
- Autoencoder
- Segmentation
- Industry, innovation and infrastructure