preprintarXiv (Cornell University)Feb 7, 2022GREEN OA

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Indexed inarxivdatacite

Abstract

In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization. We propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks, including image generation, visual grounding, image captioning, image classification, language modeling, etc., in a simple sequence-to-sequence learning framework. OFA follows the instruction-based learning in both pretraining and finetuning stages, requiring no extra task-specific layers for downstream tasks. In comparison with the recent state-of-the-art vision & language models that rely on extremely…

Citation impact

258
total citations
FWCI
Percentile
References
0
Citations per year

Authors

10

Topics & keywords

Keywords
  • Closed captioning
  • Computer science
  • Sequence (biology)
  • Task (project management)
  • Modal
  • Modality (human–computer interaction)
  • Set (abstract data type)
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.