Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the necessity and applicability boundaries of Large Reasoning Models (LRMs) in agent-based tasks. Addressing three core questions—whether LRMs outperform Large Language Models (LLMs), whether hybrid architectures enhance performance, and what inference costs and maladaptive behaviors LRMs entail—we introduce LaRMA, a structured evaluation framework comprising nine benchmark tasks. We systematically compare LLMs (e.g., Claude 3.5 Sonnet) and LRMs (e.g., DeepSeek-R1) across tool invocation, planning, and problem solving. We propose, for the first time, an LLM-as-Executor / LRM-as-Reflector collaborative architecture and establish an empirical paradigm for reasoning utility assessment. Results show LRMs improve planning accuracy by 23.6%, whereas LLMs exhibit superior efficiency and stability in tool execution. The hybrid architecture achieves a 17.2% gain in overall performance but also uncovers novel LRM challenges: factual neglect, response latency, and over-reasoning.

Technology Category

Application Category

📝 Abstract
The rise of Large Reasoning Models (LRMs) signifies a paradigm shift toward advanced computational reasoning. Yet, this progress disrupts traditional agent frameworks, traditionally anchored by execution-oriented Large Language Models (LLMs). To explore this transformation, we propose the LaRMA framework, encompassing nine tasks across Tool Usage, Plan Design, and Problem Solving, assessed with three top LLMs (e.g., Claude3.5-sonnet) and five leading LRMs (e.g., DeepSeek-R1). Our findings address four research questions: LRMs surpass LLMs in reasoning-intensive tasks like Plan Design, leveraging iterative reflection for superior outcomes; LLMs excel in execution-driven tasks such as Tool Usage, prioritizing efficiency; hybrid LLM-LRM configurations, pairing LLMs as actors with LRMs as reflectors, optimize agent performance by blending execution speed with reasoning depth; and LRMs' enhanced reasoning incurs higher computational costs, prolonged processing, and behavioral challenges, including overthinking and fact-ignoring tendencies. This study fosters deeper inquiry into LRMs' balance of deep thinking and overthinking, laying a critical foundation for future agent design advancements.
Problem

Research questions and friction points this paper is trying to address.

Explores necessity of reasoning capabilities in agent scenarios.
Compares performance of LRMs and LLMs in reasoning tasks.
Investigates hybrid configurations for optimizing agent performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LaRMA framework integrates nine reasoning tasks
Hybrid LLM-LRM optimizes agent performance
LRMs enhance reasoning but increase computational costs
🔎 Similar Papers
No similar papers found.
X
Xueyang Zhou
Huazhong University of Science and Technology
G
Guiyao Tie
Huazhong University of Science and Technology
Guowen Zhang
Guowen Zhang
The Hong Kong Polytechnic University
Computer Vision3D VisionAutonomous Driving
W
Weidong Wang
Huazhong University of Science and Technology
Z
Zhigang Zuo
Huazhong University of Science and Technology
D
Duanfeng Chu
Wuhan University of Technology
P
Pan Zhou
Huazhong University of Science and Technology
L
Lichao Sun
Lehigh University
Neil Zhenqiang Gong
Neil Zhenqiang Gong
Associate Professor, Duke University
SecurityAI Security/SafetySocial Networks SecurityGenerative AI