Improving Language Agents through BREW

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based agents face challenges in structured reasoning, tool invocation, and environmental adaptation—including high training overhead, slow convergence, opaque decision policies, and difficulty in iterative optimization. This paper introduces BREW, a novel framework that abandons conventional weight fine-tuning in favor of constructing an editable, extensible experience knowledge base. BREW enables transparent and controllable agent behavior optimization via task-level scoring, behavior-specification-driven knowledge distillation, and noise-robust state-space search. Key techniques include dynamic knowledge-base retrieval, chunked experience memory storage, and behavior-rule-guided policy generation. Evaluated on OSWorld, τ²Bench, and SpreadsheetBench, BREW improves task accuracy by 10–20%, reduces API calls by 10–15%, and significantly accelerates execution—while maintaining computational overhead comparable to baseline models.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM)-based agents are increasingly applied to tasks requiring structured reasoning, tool use, and environmental adaptation, such as data manipulation, multistep planning, and computer-use automation. However, despite their versatility, current training paradigms for model weight optimization methods, like PPO and GRPO, remain relatively impractical with their high computational overhead for rollout convergence. In addition, the resulting agent policies are difficult to interpret, adapt, or incrementally improve. To address this, we investigate creating and refining structured memory of experiential learning of an agent from its environment as an alternative route to agent optimization. We introduce BREW (Bootstrapping expeRientially-learned Environmental knoWledge), a framework for agent optimization for downstream tasks via KB construction and refinement. In our formulation, we introduce an effective method for partitioning agent memory for more efficient retrieval and refinement. BREW uses task graders and behavior rubrics to learn insights while leveraging state-space search for ensuring robustness from the noise and non-specificity in natural language. Empirical results on real world, domain-grounded benchmarks -- OSWorld, $τ^2$Bench, and SpreadsheetBench -- show BREW achieves $10-20%$ improvement in task precision, $10-15%$ reduction in API/tool calls leading to faster execution time, all while maintaining computational efficiency on par with base models. Unlike prior work where memory is treated as static context, we establish the KB as a modular and controllable substrate for agent optimization -- an explicit lever for shaping behavior in a transparent, interpretable, and extensible manner.
Problem

Research questions and friction points this paper is trying to address.

Optimizing language agents with experiential memory instead of weight training
Improving task precision while reducing computational overhead and tool calls
Creating modular knowledge bases for transparent and controllable agent behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bootstrapping experiential knowledge for agent optimization
Partitioning agent memory for efficient retrieval
Using task graders and search for robustness