preprintIEEE Transactions on MultimediaJan 1, 2026HYBRID OA

Image Captioning via Compact Bidirectional Architecture

ZSZijie SongYZYuanen ZhouZHZhenzhen HuDLDaqing LiuHBHuixia Ben

Anhui University · National Science Centre · +3 more institutions

Indexed inarxivcrossrefdatacite

Abstract

Most current image captioning models typically generate captions from left-to-right. This unidirectional property makes them can only leverage past context but not future context. Though refinement-based models can exploit both past and future context by generating a new caption in the second stage based on pre-retrieved or pre-generated captions in the first stage, the decoder of these models generally consists of two networks (i.e. a retriever or captioner in the first stage and a captioner in the second stage), which can only be executed sequentially. In this paper, we introduce a Compact Bidirectional Transformer model for image captioning that can leverage bidirectional context implicitly and explicitly…

Citation impact

14
total citations
FWCI
0.00
Percentile
98%
References
0
Citations per year

Authors

7

Topics & keywords

Keywords
  • Closed captioning
  • Computer science
  • Sentence
  • Transformer
  • Leverage (statistics)
  • Language model
  • Exploit
  • Artificial intelligence
No related works found for this paper.

Funding