🤖 AI Summary
This paper addresses the evolving reasoning capabilities of large language models (LLMs), aiming to rigorously delineate their fundamental competencies beyond conventional chatbot functionality. We propose the first orthogonal two-dimensional taxonomy: the horizontal axis distinguishes *when* reasoning occurs—*inference-time expansion* versus *training-time acquisition*; the vertical axis differentiates *system architecture*—*monolithic LLMs* versus *tool-augmented or multi-agent systems*—yielding a four-quadrant analytical framework. This taxonomy unifies diverse techniques—including prompt engineering, candidate sampling optimization, supervised fine-tuning (SFT), PPO/GRPO-based reinforcement learning, reasoning-verifier collaboration, and LLM-based debate—and precisely situates landmark works such as DeepSeek-R1, OpenAI Deep Research, and Manus Agent. Our analysis reveals two paradigm shifts: from *inference-time expansion* to *learning-to-reason*, and from single-model reasoning to *agentified workflows*. The framework provides a structured theoretical foundation and technical roadmap for advancing LLM reasoning research.
📝 Abstract
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, which define the stage at which reasoning is achieved (either at inference time or through dedicated training); and (2) Architectures, which determine the components involved in the reasoning process, distinguishing between standalone LLMs and agentic compound systems that incorporate external tools, and multi-agent collaborations. Within each dimension, we analyze two key perspectives: (1) Input level, which focuses on techniques that construct high-quality prompts that the LLM condition on; and (2) Output level, which methods that refine multiple sampled candidates to enhance reasoning quality. This categorization provides a systematic understanding of the evolving landscape of LLM reasoning, highlighting emerging trends such as the shift from inference-scaling to learning-to-reason (e.g., DeepSeek-R1), and the transition to agentic workflows (e.g., OpenAI Deep Research, Manus Agent). Additionally, we cover a broad spectrum of learning algorithms, from supervised fine-tuning to reinforcement learning such as PPO and GRPO, and the training of reasoners and verifiers. We also examine key designs of agentic workflows, from established patterns like generator-evaluator and LLM debate to recent innovations. ...