🤖 AI Summary
Language agents face bottlenecks in complex cross-domain tasks, particularly in error correction and experience reuse. This paper proposes a hierarchical experience framework that establishes the first shared knowledge base enabling cross-agent experience transfer, unifying high-level strategies and low-level execution details through semantic representation and retrieval. It introduces a Reason-Retrieve-Refine pipeline that iteratively refines agent behavior by integrating large language model reasoning with log-based retrospective analysis. The core contribution is breaking down experience silos among agents and establishing a scalable, collaborative learning mechanism. Experiments demonstrate substantial improvements: up to +16.28 percentage points on the GAIA benchmark; significant gains in multi-task performance for Claude-3 and GPT-4; and an increase in code repair success rate on SWE-bench from 41.33% to 53.33%.
📝 Abstract
As language agents tackle increasingly complex tasks, they struggle with effective error correction and experience reuse across domains. We introduce Agent KB, a hierarchical experience framework that enables complex agentic problem solving via a novel Reason-Retrieve-Refine pipeline. Agent KB addresses a core limitation: agents traditionally cannot learn from each other's experiences. By capturing both high-level strategies and detailed execution logs, Agent KB creates a shared knowledge base that enables cross-agent knowledge transfer. Evaluated on the GAIA benchmark, Agent KB improves success rates by up to 16.28 percentage points. On the most challenging tasks, Claude-3 improves from 38.46% to 57.69%, while GPT-4 improves from 53.49% to 73.26% on intermediate tasks. On SWE-bench code repair, Agent KB enables Claude-3 to improve from 41.33% to 53.33%. Our results suggest that Agent KB provides a modular, framework-agnostic infrastructure for enabling agents to learn from past experiences and generalize successful strategies to new tasks.