Larger and more instructable language models become less reliable

Zhou, Lexin; Schellaert, Wout; Martínez‐Plumed, Fernando; Moros-Daval, Yael; Ferri, Cèsar; Hernández‐Orallo, José

doi:10.1038/s41586-024-07930-y

articleNatureSep 25, 2024HYBRID OA

Larger and more instructable language models become less reliable

LZLexin Zhou WSWout Schellaert FMFernando Martínez‐Plumed YMYael Moros-Daval CFCèsar Ferri

University of Cambridge · Artificial Intelligence Research Institute · +3 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Abstract The prevailing methods to make large language models more powerful and amenable have been based on continuous scaling up (that is, increasing their size, data volume and computational resources 1 ) and bespoke shaping up (including post-filtering 2,3 , fine tuning or use of human feedback 4,5 ). However, larger and more instructable large language models may have become less reliable. By studying the relationship between difficulty concordance, task avoidance and prompting stability of several language model families, here we show that easy instances for human participants are also easy for the models, but scaled-up, shaped-up models do not secure areas of low difficulty in which either the model does…

Citation impact

150

total citations

FWCI: 47.01
Percentile: 100%
References: 51

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Bespoke
Computer science
Task (project management)
Language model
Stability (learning theory)
Scaling
Cognitive psychology
Data science

UN Sustainable Development Goals

Quality Education

No related works found for this paper.