A Survey of Reinforcement Learning for Large Reasoning Models

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited performance on complex logical reasoning tasks—such as mathematical reasoning and program synthesis—necessitating evolution into large reasoning models (LRMs) empowered by reinforcement learning (RL). Method: This work systematically analyzes the technical trajectory of RL-augmented LRMs since DeepSeek-R1, proposing a scalability-enhanced framework for artificial superintelligence (ASI), integrating algorithmic design, training resource allocation, and distributed infrastructure optimization. It employs a multi-stage reasoning–feedback mechanism unifying RL training, standardized evaluation benchmarks, and large-scale inference optimization. Contribution/Results: We introduce Awesome-RL-for-LRMs—an open-source repository that, for the first time, consolidates core challenges and future directions for RL-based reasoning modeling. The framework provides both theoretical foundations and practical guidelines for advancing LRMs, enabling scalable, efficient, and robust reasoning capabilities.

Technology Category

Application Category

📝 Abstract

In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs

Problem

Research questions and friction points this paper is trying to address.

Surveying RL advances for reasoning with LLMs

Addressing scalability challenges in RL for LRMs

Enhancing RL methodologies for Artificial SuperIntelligence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning for Large Language Models

Enhancing reasoning in mathematics and coding

Scaling RL for Artificial SuperIntelligence

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting