BabyTalk: Understanding and Generating Simple Image Descriptions

Stony Brook University

PubMed
Indexed incrossrefpubmed

Abstract

We present a system to automatically generate natural language descriptions from images. This system consists of two parts. The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best content words to use to describe an image. The second step, surface realization, chooses words to construct natural language sentences based on the predicted content and general statistics from natural language. We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions. We also…

Citation impact

871
total citations
FWCI
35.65
Percentile
100%
References
64
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Realization (probability)
  • Artificial intelligence
  • Natural language processing
  • Natural language
  • Similarity (geometry)
  • Image (mathematics)
  • Construct (python library)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.