preprintarXiv (Cornell University)Mar 9, 2023GREEN OA

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Indexed inarxivdatacite

Abstract

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object…

Citation impact

243
total citations
FWCI
Percentile
References
0
Citations per year

Authors

12

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Object (grammar)
  • Object detection
  • Set (abstract data type)
  • Detector
  • Benchmark (surveying)
  • Open set
No related works found for this paper.

Funding