articleJun 1, 2023Closed access

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

Hong Kong University of Science and Technology · Tsinghua University

Indexed incrossref

Abstract

In this paper we present Mask DINO, a unified object detection and segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by adding a mask prediction branch which supports all image segmentation tasks (instance, panoptic, and semantic). It makes use of the query embeddings from DINO to dot-product a high-resolution pixel embedding map to predict a set of binary masks. Some key components in DINO are extended for segmentation through a shared architecture and training process. Mask DINO is simple, efficient, and scalable, and it can benefit from joint large-scale detection and segmentation datasets. Our experiments show that Mask DINO significantly outperforms all existing…

Citation impact

473
total citations
FWCI
53.07
Percentile
100%
References
38
Citations per year

Authors

7

Topics & keywords

Keywords
  • Segmentation
  • Computer science
  • Artificial intelligence
  • Scalability
  • Image segmentation
  • Embedding
  • Object detection
  • Segmentation-based object categorization
No related works found for this paper.