articleJun 16, 2024Closed access

mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Alibaba Group (United States)

Indexed incrossref

Abstract

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily fo-cus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collab-oration to improve performance in both text and multi-modal tasks. mPLUG-Owl2 utilizes a modularized network design, with the language decoder acting as a universal interface for managing different modalities. Specifically, mPLUG-Owl2 incorporates shared functional modules to facilitate modal-ity collaboration and introduces a modality-adaptive module that preserves…

Citation impact

147
total citations
FWCI
33.47
Percentile
100%
References
98
Citations per year

Authors

9

Topics & keywords

Keywords
  • Modality (human–computer interaction)
  • Computer science
  • Modal
  • Programming language
  • Human–computer interaction
No related works found for this paper.