Otter: A Multi-Modal Model With In-Context Instruction Tuning
Nanyang Technological University · Nanyang Institute of Technology · +1 more institution
Abstract
Recent advances in Large Multimodal Models (LMMs) have unveiled great potential as visual assistants. However, most existing works focus on responding to individual instructions or using previous dialogues for contextual understanding. There is little discussion on employing both images and text as in-context examples to enhance the instruction following capability. To bridge this gap, we introduce the Otter model to leverage both textual and visual in-context examples for instruction tuning. Specifically, Otter builds upon Flamingo with Perceiver architecture, and has been instruction tuned for general purpose multi-modal assistant. Otter seamlessly processes multi-modal inputs, supporting modalities…
Citation impact
- FWCI
- 63.81
- Percentile
- 100%
- References
- 77
Authors
9- BLBo LiCorresponding
Nanyang Technological University, Nanyang Institute of Technology
- YZYuanhan Zhang
Nanyang Technological University, Nanyang Institute of Technology
- LCLiangyu Chen
Nanyang Technological University, Nanyang Institute of Technology
- JWJinghao Wang
Nanyang Technological University, Nanyang Institute of Technology
- FPFanyi Pu
Nanyang Technological University, Nanyang Institute of Technology
Topics & keywords
- Context (archaeology)
- Computer science
- Modal
- Otter
- Artificial intelligence
- Context model
- Geology
- Geography
- Quality Education