InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
ShangHai JiAi Genetics & IVF Institute · Shanghai Artificial Intelligence Laboratory · +5 more institutions
Abstract
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the…
Citation impact
- FWCI
- 100.18
- Percentile
- 100%
- References
- 118
Authors
12- WWWenhai WangCorresponding
ShangHai JiAi Genetics & IVF Institute, Shanghai Artificial Intelligence Laboratory
- JDJifeng Dai
ShangHai JiAi Genetics & IVF Institute, Tsinghua University, Shanghai Artificial Intelligence Laboratory
- ZCZhe Chen
Nanjing University, ShangHai JiAi Genetics & IVF Institute, Shanghai Artificial Intelligence Laboratory
- ZHZhenhang Huang
Shanghai Artificial Intelligence Laboratory, ShangHai JiAi Genetics & IVF Institute
- ZLZhiqi Li
Shanghai Artificial Intelligence Laboratory, Nanjing University, ShangHai JiAi Genetics & IVF Institute
Topics & keywords
- Computer science
- Convolutional neural network
- Artificial intelligence
- Segmentation
- Convolution (computer science)
- Scale (ratio)
- Deep learning
- Pattern recognition (psychology)