Dual-path Convolutional Image-Text Embeddings with Instance Loss

Zheng, Zhedong; Zheng, Liang; Garrett, Michael; Yang, Yi; Xu, Mingliang; Shen, Yi-Dong

doi:10.1145/3383184

articleACM Transactions on Multimedia Computing Communications and ApplicationsMay 22, 2020GREEN OA

Dual-path Convolutional Image-Text Embeddings with Instance Loss

ZZZhedong ZhengLZLiang ZhengMGMichael GarrettYYYi YangMXMingliang Xu

University of Technology Sydney · Australian National University · +4 more institutions

Indexed inarxivcrossref

Abstract

Matching images and sentences demands a fine understanding of both modalities. In this article, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image/text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss on heterogeneous features (i.e., text and image features) is less effective, because it is hard to find appropriate triplets at the beginning. So the naive way of using the ranking loss may compromise the network from learning inter-modal relationship. To address this problem, we propose the instance loss, which…

Citation impact

511

total citations

FWCI: 24.91
Percentile: 100%
References: 64

Citations per year

Authors

6

ZZ
Zhedong ZhengCorresponding
University of Technology Sydney
LZ
Liang Zheng
Australian National University
MG
Michael Garrett
Edith Cowan University
YY
Yi Yang
University of Technology Sydney
MX
Mingliang Xu
Zhengzhou University

Topics & keywords

Topics

Keywords

Ranking (information retrieval)
Discriminative model
Initialization
Word2vec
Image (mathematics)
Embedding
Learning to rank
Granularity

No related works found for this paper.