OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Indexed inarxivdatacite
Abstract
In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization. We propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks, including image generation, visual grounding, image captioning, image classification, language modeling, etc., in a simple sequence-to-sequence learning framework. OFA follows the instruction-based learning in both pretraining and finetuning stages, requiring no extra task-specific layers for downstream tasks. In comparison with the recent state-of-the-art vision & language models that rely on extremely…
Citation impact
258
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
10Topics & keywords
Topics
Keywords
- Closed captioning
- Computer science
- Sequence (biology)
- Task (project management)
- Modal
- Modality (human–computer interaction)
- Set (abstract data type)
- Artificial intelligence
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.