YOLOv12: Attention-Centric Real-Time Object Detectors

Tian, Yunjie; Ye, Qixiang; Doermann, David

doi:10.48550/arxiv.2502.12524

preprintArXiv.orgFeb 18, 2025GREEN OA

YOLOv12: Attention-Centric Real-Time Object Detectors

YTYunjie Tian QYQixiang Ye DDDavid Doermann

Indexed inarxivdatacite

Abstract

Enhancing the network architecture of the YOLO framework has been crucial for a long time, but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by…

Citation impact

218

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Object (grammar)
Detector
Computer vision
Artificial intelligence
Telecommunications

No related works found for this paper.