CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels

Li, Siyuan; Li, Sun; Li, Qingli

doi:10.1609/aaai.v37i1.25225

articleProceedings of the AAAI Conference on Artificial IntelligenceJun 26, 2023DIAMOND OA

CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels

SLSiyuan Li SLSun Li QLQingli Li

East China Normal University

Indexed incrossref

Abstract

Pre-trained vision-language models like CLIP have recently shown superior performances on various downstream tasks, including image classification and segmentation. However, in fine-grained image re-identification (ReID), the labels are indexes, lacking concrete text descriptions. Therefore, it remains to be determined how such models could be applied to these tasks. This paper first finds out that simply fine-tuning the visual model initialized by the image encoder in CLIP, has already obtained competitive performances in various ReID tasks. Then we propose a two-stage strategy to facilitate a better visual representation. The key idea is to fully exploit the cross-modal description ability in CLIP through a…

Citation impact

232

total citations

FWCI: 13.67
Percentile: 100%
References: 92

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Encoder
Embedding
Identification (biology)
Feature (linguistics)
Artificial intelligence
Image (mathematics)
Code (set theory)

No related works found for this paper.

Funding

SA
Science and Technology Commission of Shanghai Municipality
Awards: 22DZ2229004, 19511120800