Do Agents Need to Plan Step-by-Step? Rethinking Planning Horizon in Data-Centric Tool Calling

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study challenges the prevailing assumption that step-by-step monitoring is essential for adaptive performance in data-intensive tasks by systematically investigating planning horizon as an independent variable. Through controlled experiments, the authors compare full-horizon (FH) planning against single-horizon (SH) planning in knowledge-base question answering and multi-hop reasoning tasks. The results demonstrate that FH planning, augmented with on-demand replanning, achieves accuracy comparable to SH planning across varying task depths, breadths, and tool robustness conditions—while reducing token consumption by a factor of 2–3. These findings question the necessity of continuous, fine-grained monitoring in structured data tasks and suggest that broader planning horizons can yield substantial efficiency gains without compromising performance.

📝 Abstract

Explicit planning is a critical capability for LLM-based agents solving complex data-centric tasks, which require precise tool calling over external data sources. Existing strategies fall into two paradigms based on planning horizon: (1) full-horizon (FH), which generates a complete plan before execution, and (2) single-step horizon (SH), which interleaves each action (tool call) with incremental reasoning and observation. While step-by-step execution is a common default under the assumption that eager execution monitoring is necessary for adaptability, we revisit this assumption for well-defined data-centric tasks. Our controlled empirical study isolates planning horizon as the key architectural feature and systematically analyzes the effects of topological complexity and tool robustness on both paradigms. Our experiments across Knowledge Base Question Answering and Multi-hop QA show that FH planning with lazy replanning achieves accuracy parity with SH across varying depths, breadths, and robustness levels, while using 2-3x fewer tokens. These findings suggest that for well-defined data-centric tasks, eager step-wise monitoring is often unnecessary, and full-horizon planning with on-demand replanning can offer a more efficient default.

Problem

Research questions and friction points this paper is trying to address.

planning horizon

tool calling

data-centric tasks

LLM-based agents

step-by-step planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

planning horizon

tool calling

data-centric tasks