Auto-Encoding Scene Graphs for Image Captioning
Nanyang Technological University
Abstract
We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions. Intuitively, we humans use the inductive bias to compose collocations and contextual inference in discourse. For example, when we see the relation "person on bike'', it is natural to replace "on'' with "ride'' and infer "person riding bike on a road'' even the "road'' is not evident. Therefore, exploiting such bias as a language prior is expected to help the conventional encoder-decoder models less likely to overfit to the dataset bias and focus on reasoning. Specifically, we use the scene graph - a directed graph (G) where an object node is…
Citation impact
- FWCI
- 52.05
- Percentile
- 100%
- References
- 96
Authors
4Topics & keywords
- Closed captioning
- Encoding (memory)
- Computer science
- Image (mathematics)
- Artificial intelligence
- Computer vision