articleJun 17, 2024Closed access
Recognize Anything: A Strong Image Tagging Model
Indexed incrossref
Abstract
We present the Recognize Anything Model (RAM): a strong foundation model for image tagging. RAM makes a substantial step for foundation models in computer vision, demonstrating the zero-shot ability to recognize any common category with high accuracy. By leveraging large-scale image-text pairs for training instead of manual annotations, RAM introduces a new paradigm for image tagging.The development of RAM comprises four key steps. Firstly, annotation-free image tags are obtained at scale through automatic text semantic parsing. Subsequently, a preliminary model is trained for automatic annotation by unifying the captioning and tagging tasks, supervised by the original texts and parsed tags, respectively.…
Citation impact
128
total citations
- FWCI
- 28.55
- Percentile
- 100%
- References
- 28
Citations per year
Authors
12Topics & keywords
Keywords
- Computer science
- Artificial intelligence
- Image (mathematics)
- Computer vision
No related works found for this paper.