Flamingo: a Visual Language Model for Few-Shot Learning

Alayrac, Jean-Baptiste; Donahue, Jeff; Luc, Pauline; Miech, Antoine; Barr, Iain; Hasson, Yana; Lenc, Karel; Mensch, Arthur; Millican, Katie; Reynolds, Malcolm; Ring, Roman; Rutherford, Eliza; Cabi, Serkan; Han, Tengda; Gong, Zhitao; Samangooei, Sina; Monteiro, Marianne; Menick, Jacob; Borgeaud, Sebastian; Brock, Andrew; Nematzadeh, Aida; Sharifzadeh, Sahand; Bińkowski, Mikołaj; Barreira, Ricardo; Vinyals, Oriol; Zisserman, Andrew; Simonyan, Karen

doi:10.48550/arxiv.2204.14198

preprintarXiv (Cornell University)Apr 29, 2022GREEN OA

Flamingo: a Visual Language Model for Few-Shot Learning

JAJean-Baptiste Alayrac JDJeff Donahue PLPauline Luc AMAntoine Miech IBIain Barr

Indexed inarxivdatacite

Abstract

Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We…

Citation impact

1,240

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

27

Topics & keywords

Topics

Keywords

Computer science
Closed captioning
Context (archaeology)
Flexibility (engineering)
Variety (cybernetics)
Task (project management)
Key (lock)
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.