Language Models are Few-Shot Learners

Brown, T. B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric J.; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario

doi:10.48550/arxiv.2005.14165

preprintarXiv (Cornell University)May 28, 2020GREEN OA

Language Models are Few-Shot Learners

TBT. B. Brown BMBenjamin Mann NRNick Ryder MSMelanie Subbiah JKJared Kaplan

Indexed inarxivdatacite

Abstract

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train…

Citation impact

3,029

total citations

FWCI: —
Percentile: —
References: 127

Citations per year

Authors

31

Topics & keywords

Topics

Keywords

Computer science
Task (project management)
Language model
Natural language processing
Sentence
Artificial intelligence
Word (group theory)
Simple (philosophy)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.