DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Johnson, Justin; Karpathy, Andrej; Fei-Fei, Li

doi:10.1109/cvpr.2016.494

preprintJun 1, 2016Closed access

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

JJJustin Johnson AKAndrej Karpathy LFLi Fei-Fei

Stanford University

Indexed incrossref

Abstract

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language. The dense captioning task generalizes object detection when the descriptions consist of a single word, and Image Captioning when one predicted region covers the full image. To address the localization and description task jointly we propose a Fully Convolutional Localization Network (FCLN) architecture that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with a single round of optimization. The architecture is composed of a Convolutional Network, a novel dense localization…

Citation impact

1,194

total citations

FWCI: 90.56
Percentile: 100%
References: 87

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Closed captioning
Computer science
Convolutional neural network
Artificial intelligence
Task (project management)
Word (group theory)
Salient
Image (mathematics)

No related works found for this paper.