Side Adapter Network for Open-Vocabulary Semantic Segmentation
Huazhong University of Science and Technology · Microsoft Research Asia (China)
Abstract
This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks. This decoupled design has the benefit CLIP in recognizing the class of mask proposals. Since the attached side network can reuse CLIP features, it can be very light. In addition, the entire network can be trained end-to-end, allowing the side network to be…
Citation impact
- FWCI
- 29.85
- Percentile
- 100%
- References
- 63
Authors
5- MXMengde XuCorresponding
Huazhong University of Science and Technology, Microsoft Research Asia (China)
- ZZZheng Zhang
Microsoft Research Asia (China), Huazhong University of Science and Technology
- FWFangyun Wei
Microsoft Research Asia (China)
- HHHanping Hu
Microsoft Research Asia (China)
- XBXiang Bai
Huazhong University of Science and Technology
Topics & keywords
- Computer science
- Segmentation
- Inference
- Vocabulary
- Artificial intelligence
- Adapter (computing)
- Visualization
- Task (project management)
- Quality Education