🤖 AI Summary
Large language models (LLMs) suffer from context drift, goal divergence, and cyclic reasoning failures in long-horizon, multi-step tasks. To address these challenges, we propose a shared-context hierarchical planning framework that jointly ensures goal consistency, contextual coherence, and inference efficiency via three synergistic mechanisms: pre-planning–driven task decomposition, structured parent-plan feedback, and memory-efficient execution. Our approach innovatively integrates recursive context-aware reasoning, dynamic hierarchical feedback injection, and linearly scalable active prompt management—effectively mitigating cross-level discontinuities and redundant prompting. Evaluated on long-horizon reasoning benchmarks such as Robotouille under strict pass@1 evaluation, our method achieves 32% and 29% absolute success rate improvements over baselines in synchronous and asynchronous settings, respectively. These gains reflect substantially enhanced subgoal alignment and sustained planning robustness across extended task horizons.
📝 Abstract
Long-horizon tasks requiring multi-step reasoning and dynamic re-planning remain challenging for large language models (LLMs). Sequential prompting methods are prone to context drift, loss of goal information, and recurrent failure cycles, while hierarchical prompting methods often weaken cross-level continuity or incur substantial runtime overhead. We introduce ReCAP (Recursive Context-Aware Reasoning and Planning), a hierarchical framework with shared context for reasoning and planning in LLMs. ReCAP combines three key mechanisms: (i) plan-ahead decomposition, in which the model generates a full subtask list, executes the first item, and refines the remainder; (ii) structured re-injection of parent plans, maintaining consistent multi-level context during recursive return; and (iii) memory-efficient execution, bounding the active prompt so costs scale linearly with task depth. Together these mechanisms align high-level goals with low-level actions, reduce redundant prompting, and preserve coherent context updates across recursion. Experiments demonstrate that ReCAP substantially improves subgoal alignment and success rates on various long-horizon reasoning benchmarks, achieving a 32% gain on synchronous Robotouille and a 29% improvement on asynchronous Robotouille under the strict pass@1 protocol.