VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

Faghri, Fartash; Fleet, David J.; Kiros, Jamie; Fidler, Sanja

doi:10.48550/arxiv.1707.05612

preprintarXiv (Cornell University)Jul 18, 2017GREEN OA

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

FFFartash Faghri DJDavid J. Fleet JKJamie Kiros SFSanja Fidler

Indexed inarxivdatacite

Abstract

We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by hard negative mining, the use of hard negatives in structured prediction, and ranking loss functions, we introduce a simple change to common loss functions used for multi-modal embeddings. That, combined with fine-tuning and use of augmented data, yields significant gains in retrieval performance. We showcase our approach, VSE++, on MS-COCO and Flickr30K datasets, using ablation studies and comparisons with existing methods. On MS-COCO our approach outperforms state-of-the-art methods by 8.8% in caption retrieval and 11.3% in image retrieval (at R@1).

Citation impact

580

total citations

FWCI: —
Percentile: —
References: 29

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Negative
Artificial intelligence
Information retrieval
Art
Visual arts

UN Sustainable Development Goals

Quality Education

No related works found for this paper.