Boosting Image Captioning with Attributes
Microsoft Research Asia (China) · University of Science and Technology of China · +1 more institution
Abstract
Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) - a novel architecture that integrates attributes into the successful Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs) image captioning framework, by training them in an end-to-end manner. Particularly, the learning of attributes is strengthened by integrating inter-attribute correlations into Multiple Instance Learning (MIL). To incorporate attributes into captioning, we construct variants of architectures by feeding image representations and attributes into…
Citation impact
- FWCI
- 36.66
- Percentile
- 100%
- References
- 62
Authors
5Topics & keywords
- Closed captioning
- Computer science
- Boosting (machine learning)
- Artificial intelligence
- Recurrent neural network
- Convolutional neural network
- Image (mathematics)
- Natural language
- Quality Education