DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
Individual Differences · Shanghai Jinyuan Senior High School · +5 more institutions
Abstract
Abstract General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs) 1,2 and chain-of-thought (CoT) prompting 3 , have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning…
Citation impact
- FWCI
- 919.55
- Percentile
- 100%
- References
- 15
Authors
194Topics & keywords
- Reinforcement learning
- Verbal reasoning
- Verifiable secret sharing
- Coding (social sciences)
- Automated reasoning
- Reasoning system
- Non-monotonic logic
- Case-based reasoning