articleNatureSep 17, 2025HYBRID OA

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

Individual Differences · Shanghai Jinyuan Senior High School · +5 more institutions

PubMed
Indexed incrossrefpubmed

Abstract

Abstract General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs) 1,2 and chain-of-thought (CoT) prompting 3 , have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning…

Citation impact

492
total citations
FWCI
919.55
Percentile
100%
References
15
Citations per year

Authors

194

Topics & keywords

Keywords
  • Reinforcement learning
  • Verbal reasoning
  • Verifiable secret sharing
  • Coding (social sciences)
  • Automated reasoning
  • Reasoning system
  • Non-monotonic logic
  • Case-based reasoning
No related works found for this paper.