preprintJun 1, 2016Closed access

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Stanford University

Indexed incrossref

Abstract

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language. The dense captioning task generalizes object detection when the descriptions consist of a single word, and Image Captioning when one predicted region covers the full image. To address the localization and description task jointly we propose a Fully Convolutional Localization Network (FCLN) architecture that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with a single round of optimization. The architecture is composed of a Convolutional Network, a novel dense localization…

Citation impact

1,194
total citations
FWCI
90.56
Percentile
100%
References
87
Citations per year

Authors

3

Topics & keywords

Keywords
  • Closed captioning
  • Computer science
  • Convolutional neural network
  • Artificial intelligence
  • Task (project management)
  • Word (group theory)
  • Salient
  • Image (mathematics)
No related works found for this paper.

Funding