articleJun 1, 2023Closed access

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

Wuhan University

Indexed incrossref

Abstract

Text-to-image person retrieval aims to identify the target person based on a given textual description query. The primary challenge is to learn the mapping of visual and textual modalities into a common latent space. Prior works have attempted to address this challenge by leveraging separately pre-trained unimodal models to extract visual and textual features. However, these approaches lack the necessary underlying alignment capabilities required to match multimodal data effectively. Besides, these works use prior information to explore explicit part alignments, which may lead to the distortion of intra-modality information. To alleviate these issues, we present IRRA: a cross-modal Implicit Relation Reasoning…

Citation impact

267
total citations
FWCI
30.39
Percentile
100%
References
77
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Similarity (geometry)
  • Matching (statistics)
  • Relation (database)
  • Visual reasoning
  • Modality (human–computer interaction)
  • Margin (machine learning)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding