Scaling Instruction-Finetuned Language Models

Chung, Hyung Won; Hou, Le; Longpre, Shayne; Zoph, Barret; Tay, Yi; Fedus, William; Li, Eric; Wang, Xuezhi; Dehghani, Mostafa; Brahma, Siddhartha; Webson, Albert; Gu, Shixiang; Dai, Zhuyun; Süzgün, Mirac; Chen, Xinyun; Chowdhery, Aakanksha; Alex, Castro-Ros,; Marie, Pellat,; Kevin, Robinson,; Dasha, Valter,; Narang, Sharan; Mishra, Gaurav; Yu, Adams; Zhao, Vincent; Huang, Yanping; Dai, Andrew M.; Yu, Hongkun; Petrov, Slav; H., Ed; Dean, Jeff; Devlin, Jacob; Roberts, Adam; Zhou, Denny; Le, Quoc V.; Jason, Wei,

doi:10.48550/arxiv.2210.11416

preprintarXiv (Cornell University)Oct 20, 2022GREEN OA

Scaling Instruction-Finetuned Language Models

HWHyung Won Chung LHLe Hou SLShayne Longpre BZBarret Zoph YTYi Tay

Indexed inarxivdatacite

Abstract

Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on…

Citation impact

1,187

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

35

Topics & keywords

Topics

Keywords

Computer science
Margin (machine learning)
Usability
Variety (cybernetics)
Scaling
Language model
Artificial intelligence
Parallel computing

UN Sustainable Development Goals

Quality Education

No related works found for this paper.