SimMIM: a Simple Framework for Masked Image Modeling

Tsinghua University · Microsoft Research Asia (China) · +1 more institution

Indexed incrossref

Abstract

This paper presents SimMIM, a simple framework for masked image modeling. We have simplified recently proposed relevant approaches, without the need for special designs, such as block-wise masking and tokenization via discrete VAE or clustering. To investigate what makes a masked image modeling task learn good representations, we systematically study the major components in our framework, and find that the simple designs of each component have revealed very strong representation learning performance: 1) random masking of the input image with a moderately large masked patch size (e.g., 32) makes a powerful pre-text task; 2) predicting RGB values of raw pixels by direct regression performs no worse than the…

Citation impact

1,142
total citations
FWCI
63.44
Percentile
100%
References
89
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Leverage (statistics)
  • Artificial intelligence
  • Masking (illustration)
  • Pattern recognition (psychology)
  • Code (set theory)
  • Task (project management)
  • Machine learning
No related works found for this paper.