Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

Li, Feng; Zhang, Hao; Xu, Huaizhe; Liu, Shilong; Zhang, Lei; Ni, Lionel M.; Shum, Heung‐Yeung

doi:10.1109/cvpr52729.2023.00297

articleJun 1, 2023Closed access

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

FLFeng Li HZHao Zhang HXHuaizhe Xu SLShilong Liu LZLei Zhang

Hong Kong University of Science and Technology · Tsinghua University

Indexed incrossref

Abstract

In this paper we present Mask DINO, a unified object detection and segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by adding a mask prediction branch which supports all image segmentation tasks (instance, panoptic, and semantic). It makes use of the query embeddings from DINO to dot-product a high-resolution pixel embedding map to predict a set of binary masks. Some key components in DINO are extended for segmentation through a shared architecture and training process. Mask DINO is simple, efficient, and scalable, and it can benefit from joint large-scale detection and segmentation datasets. Our experiments show that Mask DINO significantly outperforms all existing…

Citation impact

473

total citations

FWCI: 53.07
Percentile: 100%
References: 38

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Segmentation
Computer science
Artificial intelligence
Scalability
Image segmentation
Embedding
Object detection
Segmentation-based object categorization

No related works found for this paper.