articleIEEE Transactions on Image ProcessingJan 1, 2023Closed access

CLIP-Driven Fine-Grained Text-Image Person Re-Identification

Nanjing University of Science and Technology · Nanjing University of Aeronautics and Astronautics

PubMed
Indexed incrossrefpubmed

Abstract

Text-Image Person Re-identification (TIReID) aims to retrieve the image corresponding to the given text query from a pool of candidate images. Existing methods employ prior knowledge from single-modality pre-training to facilitate learning, but lack multi-modal correspondence information. Vision-Language Pre-training, such as CLIP (Contrastive Language-Image Pretraining), can address the limitation. However, CLIP falls short in capturing fine-grained information, thereby not fully leveraging its powerful capacity in TIReID. Besides, the popular explicit local matching paradigm for mining fine-grained information heavily relies on the quality of local parts and cross-modal inter-part interaction/guidance,…

Citation impact

275
total citations
FWCI
31.30
Percentile
100%
References
70
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Discriminative model
  • Artificial intelligence
  • Feature (linguistics)
  • Modality (human–computer interaction)
  • Inference
  • Sentence
  • Feature learning
UN Sustainable Development Goals
  • Reduced inequalities
No related works found for this paper.

Funding