articleDec 5, 2013Closed access

DeViSE: A Deep Visual-Semantic Embedding Model

Google (United States)

Abstract

Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources – such as text data – both to train visual models and to constrain their pre-dictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as seman-tic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet…

Citation impact

2,062
total citations
FWCI
157.16
Percentile
100%
References
21
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Leverage (statistics)
  • Embedding
  • Artificial intelligence
  • Class (philosophy)
  • Object (grammar)
  • Visualization
  • Cognitive neuroscience of visual object recognition
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.