articleJun 1, 2016Closed access

Learning Deep Representations of Fine-Grained Visual Descriptions

University of Michigan–Ann Arbor · Max Planck Institute for Informatics

Indexed incrossref

Abstract

State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manuallyencoded vectors describing shared characteristics among categories. Despite good performance, attributes have limitations: (1) finer-grained recognition requires commensurately more attributes, and (2) attributes do not provide a natural language interface. We propose to overcome these limitations by training neural language models from scratch, i.e. without pre-training and only consuming words and characters. Our proposed models train end-to-end to align with the fine-grained and…

Citation impact

792
total citations
FWCI
99.83
Percentile
100%
References
85
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Embedding
  • Encoding (memory)
  • Natural language processing
  • Salient
  • Inference
  • Visualization
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding