Dual-path Convolutional Image-Text Embeddings with Instance Loss
University of Technology Sydney · Australian National University · +4 more institutions
Abstract
Matching images and sentences demands a fine understanding of both modalities. In this article, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image/text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss on heterogeneous features (i.e., text and image features) is less effective, because it is hard to find appropriate triplets at the beginning. So the naive way of using the ranking loss may compromise the network from learning inter-modal relationship. To address this problem, we propose the instance loss, which…
Citation impact
- FWCI
- 24.91
- Percentile
- 100%
- References
- 64
Authors
6- ZZZhedong ZhengCorresponding
University of Technology Sydney
- LZLiang Zheng
Australian National University
- MGMichael Garrett
Edith Cowan University
- YYYi Yang
University of Technology Sydney
- MXMingliang Xu
Zhengzhou University
Topics & keywords
- Ranking (information retrieval)
- Discriminative model
- Initialization
- Word2vec
- Image (mathematics)
- Embedding
- Learning to rank
- Granularity