InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wang, Wenhai; Dai, Jifeng; Chen, Zhe; Huang, Zhenhang; Li, Zhiqi; Zhu, Xizhou; Hu, Xiaowei; Lü, Tong; Lu, Lewei; Li, Hongsheng; Wang, Xiaogang; Qiao, Yu

doi:10.1109/cvpr52729.2023.01385

articleJun 1, 2023Closed access

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

WWWenhai Wang JDJifeng Dai ZCZhe Chen ZHZhenhang Huang ZLZhiqi Li

ShangHai JiAi Genetics & IVF Institute · Shanghai Artificial Intelligence Laboratory · +5 more institutions

Indexed incrossref

Abstract

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the…

Citation impact

883

total citations

FWCI: 100.18
Percentile: 100%
References: 118

Citations per year

Authors

12

Topics & keywords

Topics

Keywords

Computer science
Convolutional neural network
Artificial intelligence
Segmentation
Convolution (computer science)
Scale (ratio)
Deep learning
Pattern recognition (psychology)

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Award: 61672273,61832008