articleNatureJan 14, 2026HYBRID OA

Training large language models on narrow tasks can lead to broad misalignment

Bexley Hall · Risk Management Solutions (United Kingdom) · +8 more institutions

PubMed
Indexed inarxivcrossrefpubmed

Abstract

Abstract The widespread adoption of large language models (LLMs) raises important questions about their safety and alignment 1 . Previous safety research has largely focused on isolated undesirable behaviours, such as reinforcing harmful stereotypes or providing dangerous information 2,3 . Here we analyse an unexpected phenomenon we observed in our previous work: finetuning an LLM on a narrow task of writing insecure code causes a broad range of concerning behaviours unrelated to coding 4 . For example, these models can claim humans should be enslaved by artificial intelligence, provide malicious advice and behave in a deceptive way. We refer to this phenomenon as emergent misalignment. It arises across…

Citation impact

10
total citations
FWCI
233.06
Percentile
100%
References
9
Too recent for citation history.

Authors

9

Topics & keywords

Keywords
  • Phenomenon
  • Task (project management)
  • Software deployment
  • Lead (geology)
  • Psychological intervention
  • Code (set theory)
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.

Funding