Image Captioning via Compact Bidirectional Architecture
Anhui University · National Science Centre · +3 more institutions
Abstract
Most current image captioning models typically generate captions from left-to-right. This unidirectional property makes them can only leverage past context but not future context. Though refinement-based models can exploit both past and future context by generating a new caption in the second stage based on pre-retrieved or pre-generated captions in the first stage, the decoder of these models generally consists of two networks (i.e. a retriever or captioner in the first stage and a captioner in the second stage), which can only be executed sequentially. In this paper, we introduce a Compact Bidirectional Transformer model for image captioning that can leverage bidirectional context implicitly and explicitly…
Citation impact
- FWCI
- 0.00
- Percentile
- 98%
- References
- 0
Authors
7- ZSZijie SongCorresponding
Anhui University
- YZYuanen Zhou
National Science Centre
- ZHZhenzhen Hu
Hefei University of Technology
- DLDaqing Liu
Jingdong (China)
- HBHuixia Ben
University of Science and Technology of China
Topics & keywords
- Closed captioning
- Computer science
- Sentence
- Transformer
- Leverage (statistics)
- Language model
- Exploit
- Artificial intelligence