Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Liu, Shilong; Zeng, Zhaoyang; Ren, Tianhe; Li, Feng; Zhang, Hao; Yang, Jie; Qing, Jiang,; Li, Chunyuan; Yang, Jianwei; Su, Hang; Zhu, Jun; Zhang, Lei

doi:10.48550/arxiv.2303.05499

preprintarXiv (Cornell University)Mar 9, 2023GREEN OA

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

SLShilong Liu ZZZhaoyang Zeng TRTianhe Ren FLFeng Li HZHao Zhang

Indexed inarxivdatacite

Abstract

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object…

Citation impact

243

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

12

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Object (grammar)
Object detection
Set (abstract data type)
Detector
Benchmark (surveying)
Open set

No related works found for this paper.

Funding

HK
Hong Kong University of Science and Technology