Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
Hong Kong University of Science and Technology · Tsinghua University
Abstract
In this paper we present Mask DINO, a unified object detection and segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by adding a mask prediction branch which supports all image segmentation tasks (instance, panoptic, and semantic). It makes use of the query embeddings from DINO to dot-product a high-resolution pixel embedding map to predict a set of binary masks. Some key components in DINO are extended for segmentation through a shared architecture and training process. Mask DINO is simple, efficient, and scalable, and it can benefit from joint large-scale detection and segmentation datasets. Our experiments show that Mask DINO significantly outperforms all existing…
Citation impact
- FWCI
- 53.07
- Percentile
- 100%
- References
- 38
Authors
7Topics & keywords
- Segmentation
- Computer science
- Artificial intelligence
- Scalability
- Image segmentation
- Embedding
- Object detection
- Segmentation-based object categorization