Otter: A Multi-Modal Model With In-Context Instruction Tuning

Li, Bo; Zhang, Yuanhan; Chen, Liangyu; Wang, Jinghao; Pu, Fanyi; Cahyono, Joshua Adrian; Yang, Jingkang; Li, Chunyuan; Liu, Ziwei

doi:10.1109/tpami.2025.3571946

articleIEEE Transactions on Pattern Analysis and Machine IntelligenceMay 20, 2025Closed access

Otter: A Multi-Modal Model With In-Context Instruction Tuning

BLBo LiYZYuanhan ZhangLCLiangyu ChenJWJinghao WangFPFanyi Pu

Nanyang Technological University · Nanyang Institute of Technology · +1 more institution

PubMed

Indexed incrossrefpubmed

Abstract

Recent advances in Large Multimodal Models (LMMs) have unveiled great potential as visual assistants. However, most existing works focus on responding to individual instructions or using previous dialogues for contextual understanding. There is little discussion on employing both images and text as in-context examples to enhance the instruction following capability. To bridge this gap, we introduce the Otter model to leverage both textual and visual in-context examples for instruction tuning. Specifically, Otter builds upon Flamingo with Perceiver architecture, and has been instruction tuned for general purpose multi-modal assistant. Otter seamlessly processes multi-modal inputs, supporting modalities…

Citation impact

56

total citations

FWCI: 63.81
Percentile: 100%
References: 77

Citations per year

Authors

9

BL
Bo LiCorresponding
Nanyang Technological University, Nanyang Institute of Technology
YZ
Yuanhan Zhang
Nanyang Technological University, Nanyang Institute of Technology
LC
Liangyu Chen
Nanyang Technological University, Nanyang Institute of Technology
JW
Jinghao Wang
Nanyang Technological University, Nanyang Institute of Technology
FP
Fanyi Pu
Nanyang Technological University, Nanyang Institute of Technology

Topics & keywords

Topics

Keywords

Context (archaeology)
Computer science
Modal
Otter
Artificial intelligence
Context model
Geology
Geography

UN Sustainable Development Goals

Quality Education

No related works found for this paper.