🤖 AI Summary
This work addresses the paradigm shift in large language models (LLMs) from intuitive, fast System 1 reasoning to deliberate, logical System 2 reasoning. Methodologically, it introduces the first cognitive-science–inspired dual-system framework for LLMs, establishing a unified taxonomy and evolutionary trajectory for reasoning-oriented LLMs; integrates key techniques—including Chain-of-Thought variants, verifier-guided inference, process-supervised fine-tuning, multi-stage architectures, and introspective reinforcement learning; and develops a dynamic open-source tracking ecosystem (Awesome-Slow-Reason-System). Empirically, it conducts systematic benchmarking of models—including o1, o3, and R1—on mathematical and code-reasoning tasks. The contributions include: (1) a theoretically grounded framework for mechanistic understanding of LLM reasoning, (2) principled design principles for controllable model evolution, and (3) reproducible evaluation protocols and open tools to advance research on reasoning-capable LLMs.
📝 Abstract
Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical reasoning for more accurate judgments and reduced biases. Foundational Large Language Models (LLMs) excel at fast decision-making but lack the depth for complex reasoning, as they have not yet fully embraced the step-by-step analysis characteristic of true System 2 thinking. Recently, reasoning LLMs like OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding, closely mimicking the deliberate reasoning of System 2 and showcasing human-like cognitive abilities. This survey begins with a brief overview of the progress in foundational LLMs and the early development of System 2 technologies, exploring how their combination has paved the way for reasoning LLMs. Next, we discuss how to construct reasoning LLMs, analyzing their features, the core methods enabling advanced reasoning, and the evolution of various reasoning LLMs. Additionally, we provide an overview of reasoning benchmarks, offering an in-depth comparison of the performance of representative reasoning LLMs. Finally, we explore promising directions for advancing reasoning LLMs and maintain a real-time href{https://github.com/zzli2022/Awesome-Slow-Reason-System}{GitHub Repository} to track the latest developments. We hope this survey will serve as a valuable resource to inspire innovation and drive progress in this rapidly evolving field.