Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Indexed inarxivdatacite
Abstract
In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object…
Citation impact
243
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
12Topics & keywords
Topics
Keywords
- Computer science
- Artificial intelligence
- Object (grammar)
- Object detection
- Set (abstract data type)
- Detector
- Benchmark (surveying)
- Open set
No related works found for this paper.