Image Captioning via Compact Bidirectional Architecture

Song, Zijie; Zhou, Yuanen; Hu, Zhenzhen; Liu, Daqing; Ben, Huixia; Hong, Richang; Wang, Meng

doi:10.1109/tmm.2026.3680397

preprintIEEE Transactions on MultimediaJan 1, 2026HYBRID OA

Image Captioning via Compact Bidirectional Architecture

ZSZijie SongYZYuanen Zhou ZHZhenzhen Hu DLDaqing Liu HBHuixia Ben

Anhui University · National Science Centre · +3 more institutions

Indexed inarxivcrossrefdatacite

Abstract

Most current image captioning models typically generate captions from left-to-right. This unidirectional property makes them can only leverage past context but not future context. Though refinement-based models can exploit both past and future context by generating a new caption in the second stage based on pre-retrieved or pre-generated captions in the first stage, the decoder of these models generally consists of two networks (i.e. a retriever or captioner in the first stage and a captioner in the second stage), which can only be executed sequentially. In this paper, we introduce a Compact Bidirectional Transformer model for image captioning that can leverage bidirectional context implicitly and explicitly…

Citation impact

14

total citations

FWCI: 0.00
Percentile: 98%
References: 0

Citations per year

Authors

7

ZS
Zijie SongCorresponding
Anhui University
YZ
Yuanen Zhou
National Science Centre
ZH
Zhenzhen Hu
Hefei University of Technology
DL
Daqing Liu
Jingdong (China)
HB
Huixia Ben
University of Science and Technology of China

Topics & keywords

Topics

Keywords

Closed captioning
Computer science
Sentence
Transformer
Leverage (statistics)
Language model
Exploit
Artificial intelligence

No related works found for this paper.