OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Wang, Peng; An, Yang; Men, Rui; Lin, Junyang; Bai, Shuai; Li, Zhikang; Ma, Jianxin; Zhou, Chang; Zhou, Jingren; Yang, Hongxia

doi:10.48550/arxiv.2202.03052

preprintarXiv (Cornell University)Feb 7, 2022GREEN OA

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

PWPeng Wang YAYang An RMRui Men JLJunyang Lin SBShuai Bai

Indexed inarxivdatacite

Abstract

In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization. We propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks, including image generation, visual grounding, image captioning, image classification, language modeling, etc., in a simple sequence-to-sequence learning framework. OFA follows the instruction-based learning in both pretraining and finetuning stages, requiring no extra task-specific layers for downstream tasks. In comparison with the recent state-of-the-art vision & language models that rely on extremely…

Citation impact

258

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

10

Topics & keywords

Topics

Keywords

Closed captioning
Computer science
Sequence (biology)
Task (project management)
Modal
Modality (human–computer interaction)
Set (abstract data type)
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.