🤖 AI Summary
This work addresses the limited capacity of large language models (LLMs) to plan, act, and learn through sustained interaction in open, dynamic environments. To overcome this, the authors propose a three-tiered reasoning framework that treats LLMs as autonomous agents, unifying single-agent foundational reasoning, self-evolution, and multi-agent collaboration within a coherent paradigm. The framework orchestrates structured interactions, incorporates memory mechanisms, enables tool use, and integrates both reinforcement learning and supervised fine-tuning. It explicitly distinguishes between in-context reasoning and post-training optimization pathways, systematically coupling cognition with action. Empirical validation across diverse domains—including scientific discovery, robotics, healthcare, autonomous research, and mathematics—demonstrates its effectiveness and highlights promising future directions such as personalized interaction, long-horizon engagement, world modeling, and scalable multi-agent training.
📝 Abstract
Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, they struggle in open-ended and dynamic environments. Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction. In this survey, we organize agentic reasoning along three complementary dimensions. First, we characterize environmental dynamics through three layers: foundational agentic reasoning, which establishes core single-agent capabilities including planning, tool use, and search in stable environments; self-evolving agentic reasoning, which studies how agents refine these capabilities through feedback, memory, and adaptation; and collective multi-agent reasoning, which extends intelligence to collaborative settings involving coordination, knowledge sharing, and shared goals. Across these layers, we distinguish in-context reasoning, which scales test-time interaction through structured orchestration, from post-training reasoning, which optimizes behaviors via reinforcement learning and supervised fine-tuning. We further review representative agentic reasoning frameworks across real-world applications and benchmarks, including science, robotics, healthcare, autonomous research, and mathematics. This survey synthesizes agentic reasoning methods into a unified roadmap bridging thought and action, and outlines open challenges and future directions, including personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance for real-world deployment.