BabyTalk: Understanding and Generating Simple Image Descriptions

Kulkarni, Girish; Premraj, Visruth; Ordóñez, Vicente; Dhar, Sagnik; Li, Siming; Choi, Yejin; Berg, Alexander C.; Berg, Tamara L.

doi:10.1109/tpami.2012.162

articleIEEE Transactions on Pattern Analysis and Machine IntelligenceMay 31, 2013Closed access

BabyTalk: Understanding and Generating Simple Image Descriptions

GKGirish Kulkarni VPVisruth Premraj VOVicente Ordóñez SDSagnik Dhar SLSiming Li

Stony Brook University

PubMed

Indexed incrossrefpubmed

Abstract

We present a system to automatically generate natural language descriptions from images. This system consists of two parts. The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best content words to use to describe an image. The second step, surface realization, chooses words to construct natural language sentences based on the predicted content and general statistics from natural language. We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions. We also…

Citation impact

871

total citations

FWCI: 35.65
Percentile: 100%
References: 64

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Realization (probability)
Artificial intelligence
Natural language processing
Natural language
Similarity (geometry)
Image (mathematics)
Construct (python library)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.