π€ AI Summary
This work addresses the challenge of maintaining coherent long-horizon reasoning in tool-augmented agents, which often suffer from logical fragmentation and task drift due to unstructured context accumulation and the absence of an effective memory mechanism. To this end, the authors propose MemoBrainβa procedural memory model that treats memory as a core reasoning component rather than a peripheral module. By integrating dependency-aware memory structuring, pruning of ineffective reasoning steps, and folding of sub-trajectories, MemoBrain dynamically compresses the context while preserving high-salience reasoning pathways. This approach enables active cognitive control over reasoning trajectories within a fixed context budget, achieving significant performance gains over strong baselines on long-horizon reasoning benchmarks including GAIA, WebWalker, and BrowseComp-Plus.
π Abstract
Complex reasoning in tool-augmented agent frameworks is inherently long-horizon, causing reasoning traces and transient tool artifacts to accumulate and strain the bounded working context of large language models. Without explicit memory mechanisms, such accumulation disrupts logical continuity and undermines task alignment. This positions memory not as an auxiliary efficiency concern, but as a core component for sustaining coherent, goal-directed reasoning over long horizons. We propose MemoBrain, an executive memory model for tool-augmented agents that constructs a dependency-aware memory over reasoning steps, capturing salient intermediate states and their logical relations. Operating as a co-pilot alongside the reasoning agent, MemoBrain organizes reasoning progress without blocking execution and actively manages the working context. Specifically, it prunes invalid steps, folds completed sub-trajectories, and preserves a compact, high-salience reasoning backbone under a fixed context budget. Together, these mechanisms enable explicit cognitive control over reasoning trajectories rather than passive context accumulation. We evaluate MemoBrain on challenging long-horizon benchmarks, including GAIA, WebWalker, and BrowseComp-Plus, demonstrating consistent improvements over strong baselines.