Training language models to follow instructions with human feedback

Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke E.; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan

doi:10.48550/arxiv.2203.02155

preprintarXiv (Cornell University)Mar 4, 2022GREEN OA

Training language models to follow instructions with human feedback

LOLong Ouyang JWJeff Wu XJXu Jiang DADiogo Almeida CLCarroll L. Wainwright

Indexed inarxivdatacite

Abstract

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use…

Citation impact

4,279

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

20

Topics & keywords

Topics

Keywords

Computer science
Language model
Set (abstract data type)
Simple (philosophy)
Reinforcement learning
Artificial intelligence
Range (aeronautics)
Training set

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.