articleJun 17, 2024Closed access

Recognize Anything: A Strong Image Tagging Model

Indexed incrossref

Abstract

We present the Recognize Anything Model (RAM): a strong foundation model for image tagging. RAM makes a substantial step for foundation models in computer vision, demonstrating the zero-shot ability to recognize any common category with high accuracy. By leveraging large-scale image-text pairs for training instead of manual annotations, RAM introduces a new paradigm for image tagging.The development of RAM comprises four key steps. Firstly, annotation-free image tags are obtained at scale through automatic text semantic parsing. Subsequently, a preliminary model is trained for automatic annotation by unifying the captioning and tagging tasks, supervised by the original texts and parsed tags, respectively.…

Citation impact

128
total citations
FWCI
28.55
Percentile
100%
References
28
Citations per year

Authors

12

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Image (mathematics)
  • Computer vision
No related works found for this paper.