A ConvNet for the 2020s

Liu, Zhuang; Mao, Hanzi; Wu, Chao-Yuan; Feichtenhofer, Christoph; Darrell, Trevor; Xie, Saining

doi:10.1109/cvpr52688.2022.01167

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

A ConvNet for the 2020s

ZLZhuang Liu HMHanzi Mao CWChao-Yuan Wu CFChristoph Feichtenhofer TDTrevor Darrell

Berkeley College · University of California, Berkeley · +1 more institution

Indexed incrossref

Abstract

The “Roaring 20s” of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather…

Citation impact

6,915

total citations

FWCI: 365.25
Percentile: 100%
References: 121

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Transformer
Computer science
Artificial intelligence
Segmentation
Scalability
Object detection
Machine learning
Image segmentation

No related works found for this paper.