Learning Deep Representations of Fine-Grained Visual Descriptions
University of Michigan–Ann Arbor · Max Planck Institute for Informatics
Abstract
State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manuallyencoded vectors describing shared characteristics among categories. Despite good performance, attributes have limitations: (1) finer-grained recognition requires commensurately more attributes, and (2) attributes do not provide a natural language interface. We propose to overcome these limitations by training neural language models from scratch, i.e. without pre-training and only consuming words and characters. Our proposed models train end-to-end to align with the fine-grained and…
Citation impact
- FWCI
- 99.83
- Percentile
- 100%
- References
- 85
Authors
4Topics & keywords
- Computer science
- Artificial intelligence
- Embedding
- Encoding (memory)
- Natural language processing
- Salient
- Inference
- Visualization
- Quality Education