BabyTalk: Understanding and Generating Simple Image Descriptions
Indexed incrossrefpubmed
Abstract
We present a system to automatically generate natural language descriptions from images. This system consists of two parts. The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best content words to use to describe an image. The second step, surface realization, chooses words to construct natural language sentences based on the predicted content and general statistics from natural language. We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions. We also…
Citation impact
871
total citations
- FWCI
- 35.65
- Percentile
- 100%
- References
- 64
Citations per year
Authors
8Topics & keywords
Topics
Keywords
- Computer science
- Realization (probability)
- Artificial intelligence
- Natural language processing
- Natural language
- Similarity (geometry)
- Image (mathematics)
- Construct (python library)
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.