articleJun 16, 2024Closed access

YOLO-World: Real-Time Open-Vocabulary Object Detection

Huazhong University of Science and Technology · Tencent (China)

Indexed incrossref

Abstract

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot…

Citation impact

534
total citations
FWCI
119.48
Percentile
100%
References
83
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Vocabulary
  • Object detection
  • Object (grammar)
  • Artificial intelligence
  • Linguistics
  • Pattern recognition (psychology)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding