A Survey of Reinforcement Learning for Large Reasoning Models

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited performance on complex logical reasoning tasks—such as mathematical reasoning and program synthesis—necessitating evolution into large reasoning models (LRMs) empowered by reinforcement learning (RL). Method: This work systematically analyzes the technical trajectory of RL-augmented LRMs since DeepSeek-R1, proposing a scalability-enhanced framework for artificial superintelligence (ASI), integrating algorithmic design, training resource allocation, and distributed infrastructure optimization. It employs a multi-stage reasoning–feedback mechanism unifying RL training, standardized evaluation benchmarks, and large-scale inference optimization. Contribution/Results: We introduce Awesome-RL-for-LRMs—an open-source repository that, for the first time, consolidates core challenges and future directions for RL-based reasoning modeling. The framework provides both theoretical foundations and practical guidelines for advancing LRMs, enabling scalable, efficient, and robust reasoning capabilities.

Technology Category

Application Category

📝 Abstract
In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs
Problem

Research questions and friction points this paper is trying to address.

Surveying RL advances for reasoning with LLMs
Addressing scalability challenges in RL for LRMs
Enhancing RL methodologies for Artificial SuperIntelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning for Large Language Models
Enhancing reasoning in mathematics and coding
Scaling RL for Artificial SuperIntelligence
🔎 Similar Papers
No similar papers found.
Kaiyan Zhang
Kaiyan Zhang
Tsinghua University
Foundation ModelCollective IntelligenceScientific Intelligence
Y
Yuxin Zuo
Tsinghua University
Bingxiang He
Bingxiang He
Second year PhD Candidate, Tsinghua University
Natural Language Processing
Youbang Sun
Youbang Sun
Assistant Researcher, Tsinghua University; Northeastern University; Texas A&M University
Distributed OptimizationMulti-Agent RLRiemannian OptimizationFederated Learning
R
Runze Liu
Tsinghua University
Che Jiang
Che Jiang
Tsinghua University
Yuchen Fan
Yuchen Fan
Shanghai AI Laboratory & Shanghai Jiao Tong University
NLPLarge Language ModelsEvaluation
K
Kai Tian
Tsinghua University
G
Guoli Jia
Tsinghua University
P
Pengfei Li
Harbin Institute of Technology
Y
Yu Fu
University College London
Xingtai Lv
Xingtai Lv
Tsinghua University
Large Language ModelNatural Language Processing
Y
Yuchen Zhang
Peking University
Sihang Zeng
Sihang Zeng
University of Washington
Biomedical InformaticsMachine Learning for Healthcare
Shang Qu
Shang Qu
Tsinghua University
AI4Bio
Haozhan Li
Haozhan Li
Tsinghua University
LLM RLVLA RL
S
Shijie Wang
Shanghai AI Laboratory
Y
Yuru Wang
Tsinghua University
Xinwei Long
Xinwei Long
Tsinghua University
natural language processingmulti-modal learning
Fangfu Liu
Fangfu Liu
Tsinghua University
Computer Vision3D VisionMachine Learning
X
Xiang Xu
University of Science and Technology of China
J
Jiaze Ma
Tsinghua University
Xuekai Zhu
Xuekai Zhu
Shanghai Jiao Tong University
Synthetic DataReasoningLanguage Model
Ermo Hua
Ermo Hua
Tsinghua University
Physics-driven Foundation Model
Y
Yihao Liu
Tsinghua University, Shanghai AI Laboratory