Learning Deep Representations of Fine-Grained Visual Descriptions

Reed, Scott; Akata, Zeynep; Lee, Honglak; Schiele, Bernt

doi:10.1109/cvpr.2016.13

articleJun 1, 2016Closed access

Learning Deep Representations of Fine-Grained Visual Descriptions

SRScott Reed ZAZeynep Akata HLHonglak Lee BSBernt Schiele

University of Michigan–Ann Arbor · Max Planck Institute for Informatics

Indexed incrossref

Abstract

State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manuallyencoded vectors describing shared characteristics among categories. Despite good performance, attributes have limitations: (1) finer-grained recognition requires commensurately more attributes, and (2) attributes do not provide a natural language interface. We propose to overcome these limitations by training neural language models from scratch, i.e. without pre-training and only consuming words and characters. Our proposed models train end-to-end to align with the fine-grained and…

Citation impact

792

total citations

FWCI: 99.83
Percentile: 100%
References: 85

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Embedding
Encoding (memory)
Natural language processing
Salient
Inference
Visualization

UN Sustainable Development Goals

Quality Education

No related works found for this paper.