Dual-path Convolutional Image-Text Embeddings with Instance Loss

ZZZhedong ZhengLZLiang ZhengMGMichael GarrettYYYi YangMXMingliang Xu

University of Technology Sydney · Australian National University · +4 more institutions

Indexed inarxivcrossref

Abstract

Matching images and sentences demands a fine understanding of both modalities. In this article, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image/text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss on heterogeneous features (i.e., text and image features) is less effective, because it is hard to find appropriate triplets at the beginning. So the naive way of using the ranking loss may compromise the network from learning inter-modal relationship. To address this problem, we propose the instance loss, which…

Citation impact

511
total citations
FWCI
24.91
Percentile
100%
References
64
Citations per year

Authors

6
  • ZZ
    Zhedong ZhengCorresponding

    University of Technology Sydney

  • LZ
    Liang Zheng

    Australian National University

  • MG
    Michael Garrett

    Edith Cowan University

  • YY
    Yi Yang

    University of Technology Sydney

  • MX
    Mingliang Xu

    Zhengzhou University

Topics & keywords

Keywords
  • Ranking (information retrieval)
  • Discriminative model
  • Initialization
  • Word2vec
  • Image (mathematics)
  • Embedding
  • Learning to rank
  • Granularity
No related works found for this paper.