Recognize Anything: A Strong Image Tagging Model

Zhang, Youcai; Huang, Xinyu; Ma, Jinyu; Li, Zhaoyang; Luo, Zhaochuan; Xie, Yanchun; Qin, Yuzhuo; Luo, Tong; Li, Yaqian; Liu, Shilong; Guo, Yandong; Zhang, Lei

doi:10.1109/cvprw63382.2024.00179

articleJun 17, 2024Closed access

Recognize Anything: A Strong Image Tagging Model

YZYoucai Zhang XHXinyu Huang JMJinyu Ma ZLZhaoyang Li ZLZhaochuan Luo

Indexed incrossref

Abstract

We present the Recognize Anything Model (RAM): a strong foundation model for image tagging. RAM makes a substantial step for foundation models in computer vision, demonstrating the zero-shot ability to recognize any common category with high accuracy. By leveraging large-scale image-text pairs for training instead of manual annotations, RAM introduces a new paradigm for image tagging.The development of RAM comprises four key steps. Firstly, annotation-free image tags are obtained at scale through automatic text semantic parsing. Subsequently, a preliminary model is trained for automatic annotation by unifying the captioning and tagging tasks, supervised by the original texts and parsed tags, respectively.…

Citation impact

128

total citations

FWCI: 28.55
Percentile: 100%
References: 28

Citations per year

Authors

12

Topics & keywords

Topics

Image Retrieval and Classification Techniques69%

Keywords

Computer science
Artificial intelligence
Image (mathematics)
Computer vision

No related works found for this paper.