Otter: A Multi-Modal Model With In-Context Instruction Tuning

BLBo LiYZYuanhan ZhangLCLiangyu ChenJWJinghao WangFPFanyi Pu

Nanyang Technological University · Nanyang Institute of Technology · +1 more institution

PubMed
Indexed incrossrefpubmed

Abstract

Recent advances in Large Multimodal Models (LMMs) have unveiled great potential as visual assistants. However, most existing works focus on responding to individual instructions or using previous dialogues for contextual understanding. There is little discussion on employing both images and text as in-context examples to enhance the instruction following capability. To bridge this gap, we introduce the Otter model to leverage both textual and visual in-context examples for instruction tuning. Specifically, Otter builds upon Flamingo with Perceiver architecture, and has been instruction tuned for general purpose multi-modal assistant. Otter seamlessly processes multi-modal inputs, supporting modalities…

Citation impact

56
total citations
FWCI
63.81
Percentile
100%
References
77
Citations per year

Authors

9

Topics & keywords

Keywords
  • Context (archaeology)
  • Computer science
  • Modal
  • Otter
  • Artificial intelligence
  • Context model
  • Geology
  • Geography
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.