articleJun 1, 2023Closed access
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
Indexed incrossref
Abstract
Text-to-image person retrieval aims to identify the target person based on a given textual description query. The primary challenge is to learn the mapping of visual and textual modalities into a common latent space. Prior works have attempted to address this challenge by leveraging separately pre-trained unimodal models to extract visual and textual features. However, these approaches lack the necessary underlying alignment capabilities required to match multimodal data effectively. Besides, these works use prior information to explore explicit part alignments, which may lead to the distortion of intra-modality information. To alleviate these issues, we present IRRA: a cross-modal Implicit Relation Reasoning…
Citation impact
267
total citations
- FWCI
- 30.39
- Percentile
- 100%
- References
- 77
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Computer science
- Artificial intelligence
- Similarity (geometry)
- Matching (statistics)
- Relation (database)
- Visual reasoning
- Modality (human–computer interaction)
- Margin (machine learning)
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.