ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM) agents struggle with continual learning from historical interactions in sequential tasks, often repeating past errors. To address this, we propose ReasoningBank—a memory framework that (1) extracts structured reasoning traces via a self-evaluation mechanism; (2) synthesizes generalizable reasoning strategies through contrastive learning; and (3) implements Memory-aware Test-time Scaling (MaTTS), establishing a positive feedback loop between memory quality and experience accumulation. ReasoningBank integrates retrieval-augmented generation with dynamic memory updating, enabling agents to evolve autonomously during inference. Evaluated on web navigation and software engineering benchmarks, it significantly outperforms baselines that store only raw interaction logs or success-only trajectories—achieving higher task completion rates and improved efficiency. Our results validate the feasibility and scalability of agent architectures endowed with emergent, self-evolving capabilities.

Technology Category

Application Category

📝 Abstract
With the growing adoption of large language model agents in persistent real-world roles, they naturally encounter continuous streams of tasks. A key limitation, however, is their failure to learn from the accumulated interaction history, forcing them to discard valuable insights and repeat past errors. We propose ReasoningBank, a novel memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences. At test time, an agent retrieves relevant memories from ReasoningBank to inform its interaction and then integrates new learnings back, enabling it to become more capable over time. Building on this powerful experience learner, we further introduce memory-aware test-time scaling (MaTTS), which accelerates and diversifies this learning process by scaling up the agent's interaction experience. By allocating more compute to each task, the agent generates abundant, diverse experiences that provide rich contrastive signals for synthesizing higher-quality memory. The better memory in turn guides more effective scaling, establishing a powerful synergy between memory and test-time scaling. Across web browsing and software engineering benchmarks, ReasoningBank consistently outperforms existing memory mechanisms that store raw trajectories or only successful task routines, improving both effectiveness and efficiency; MaTTS further amplifies these gains. These findings establish memory-driven experience scaling as a new scaling dimension, enabling agents to self-evolve with emergent behaviors naturally arise.
Problem

Research questions and friction points this paper is trying to address.

Enabling agents to learn from accumulated interaction history
Distilling reasoning strategies from successful and failed experiences
Accelerating learning through memory-aware test-time scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory framework distills reasoning strategies from experiences
Memory-aware test-time scaling accelerates learning process
Synergy between memory and scaling enables agent self-evolution
🔎 Similar Papers
No similar papers found.