Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

Jiang, Ding; Ye, Mang

doi:10.1109/cvpr52729.2023.00273

articleJun 1, 2023Closed access

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

DJDing Jiang MYMang Ye

Wuhan University

Indexed incrossref

Abstract

Text-to-image person retrieval aims to identify the target person based on a given textual description query. The primary challenge is to learn the mapping of visual and textual modalities into a common latent space. Prior works have attempted to address this challenge by leveraging separately pre-trained unimodal models to extract visual and textual features. However, these approaches lack the necessary underlying alignment capabilities required to match multimodal data effectively. Besides, these works use prior information to explore explicit part alignments, which may lead to the distortion of intra-modality information. To alleviate these issues, we present IRRA: a cross-modal Implicit Relation Reasoning…

Citation impact

267

total citations

FWCI: 30.39
Percentile: 100%
References: 77

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Similarity (geometry)
Matching (statistics)
Relation (database)
Visual reasoning
Modality (human–computer interaction)
Margin (machine learning)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Award: 62176188