A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit insufficient deep reasoning capabilities on complex tasks such as mathematical reasoning, visual understanding, medical diagnosis, and multi-agent debate. Method: This work is the first to systematically integrate Kahneman’s “slow thinking” cognitive theory into LLM architecture, proposing three innovations: (1) test-time dynamic compute scaling, (2) strategy-driven self-evolving reinforcement learning, and (3) a slow-thinking framework combining long-chain and hierarchical collaboration. Technically, it unifies dynamic verification, strategy network modeling, reward-guided sampling, and an enhanced chain-of-thought (CoT) mechanism to jointly optimize reasoning depth and computational efficiency. Contribution/Results: Based on a systematic review of 100+ studies, we construct a unified technical taxonomy and demonstrate significant improvements in reasoning accuracy and robustness across multiple benchmarks, establishing a novel paradigm for human-like, scalable, and trustworthy scientific reasoning and decision support in LLMs.

Technology Category

Application Category

📝 Abstract
This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic"slow thinking"- a reasoning process inspired by human cognition, as described in Kahneman's Thinking, Fast and Slow. These models, like OpenAI's o1, focus on scaling computational resources dynamically during complex tasks, such as math reasoning, visual reasoning, medical diagnosis, and multi-agent debates. We present the development of reasoning LLMs and list their key technologies. By synthesizing over 100 studies, it charts a path toward LLMs that combine human-like deep thinking with scalable efficiency for reasoning. The review breaks down methods into three categories: (1) test-time scaling dynamically adjusts computation based on task complexity via search and sampling, dynamic verification; (2) reinforced learning refines decision-making through iterative improvement leveraging policy networks, reward models, and self-evolution strategies; and (3) slow-thinking frameworks (e.g., long CoT, hierarchical processes) that structure problem-solving with manageable steps. The survey highlights the challenges and further directions of this domain. Understanding and advancing the reasoning abilities of LLMs is crucial for unlocking their full potential in real-world applications, from scientific discovery to decision support systems.
Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning LLMs via dynamic computation scaling for complex tasks
Improving decision-making in LLMs using reinforced learning strategies
Developing slow-thinking frameworks to structure problem-solving steps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic computation scaling at inference time
Reinforced learning for iterative decision refinement
Slow-thinking frameworks structuring problem-solving steps
🔎 Similar Papers
No similar papers found.
Qianjun Pan
Qianjun Pan
East China Normal University
LLM
W
Wenkai Ji
School of Computer Science and Technology, East China Normal University, China
Yuyang Ding
Yuyang Ding
Soochow University
natural language processing
Junsong Li
Junsong Li
East China Normal University
NLPLLMNLI
S
Shilian Chen
School of Computer Science and Technology, East China Normal University, China
Junyi Wang
Junyi Wang
University of Electronic Science and Tenchonolegy of China
Image RegistrationMRI
J
Jie Zhou
School of Computer Science and Technology, East China Normal University, China
Q
Qin Chen
School of Computer Science and Technology, East China Normal University, China
M
Min Zhang
School of Computer Science and Technology, East China Normal University, China
Y
Yulan Wu
School of Computer Science and Technology, East China Normal University, China
L
Liang He
School of Computer Science and Technology, East China Normal University, China