Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the challenge of efficiently balancing performance and computational cost in offline, query-agnostic memory mechanisms of large language model (LLM) agents during runtime. To this end, we propose BudgetMem, a novel framework that introduces, for the first time, a query-aware three-tier budget routing mechanism. BudgetMem explicitly decouples and controls three distinct budget dimensions—implementation complexity, reasoning behavior, and model capacity—at a fine-grained level. The framework employs a lightweight neural policy trained via reinforcement learning to dynamically allocate low, medium, and high budgets within a modular memory architecture. Experimental results demonstrate that BudgetMem not only outperforms strong baselines under high-budget conditions but also significantly advances the accuracy–cost Pareto frontier under stringent low-budget constraints across the LoCoMo, LongMemEval, and HotpotQA benchmarks.

Technology Category

Application Category

📝 Abstract

Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present \textbf{BudgetMem}, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., \textsc{Low}/\textsc{Mid}/\textsc{High}). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.

Problem

Research questions and friction points this paper is trying to address.

runtime memory

query-aware routing

budget-tier control

LLM agents

performance-cost trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

Budget-Tier Routing

Query-Aware Memory

Runtime Agent Memory