Abstract

Recent progress on Transformers and multilayer perceptron (MLP) models provide new network architectural designs for computer vision tasks. Although these models proved to be effective in many vision tasks such as image recognition, there remain challenges in adapting them for lowlevel vision. The inflexibility to support high-resolution images and limitations of local attention are perhaps the main bottlenecks. In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs.…

Citation impact

552
total citations
FWCI
31.05
Percentile
100%
References
154
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Maxim
  • Image processing
  • Computer vision
  • Artificial intelligence
  • Image (mathematics)
  • Computer graphics (images)
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.