🤖 AI Summary
This work addresses the challenge of efficiently selecting dialogue turns for storage in external memory within long-context conversational agents. Existing approaches rely on invoking large language models (LLMs) at every turn to make generative decisions, resulting in high latency and redundant computation. To overcome this, the authors propose MemRouter, a decoupled embedding-based routing mechanism that separates memory writing from response generation. MemRouter leverages a frozen LLM to encode dialogue context and employs a lightweight 12M-parameter classification head trained via supervised learning to control memory admission. Evaluated on the LoCoMo benchmark, MemRouter improves overall F1 from 45.6 to 52.0 and reduces p50 latency from 970ms to 58ms. Ablation studies further demonstrate that its learned policy yields a 10.3-point F1 gain over random storage strategies.
📝 Abstract
Long-term conversational agents must decide which turns to store in external memory, yet recent systems rely on autoregressive LLM generation at every turn to make that decision. We present MemRouter, a write-side memory router that decouples memory admission from the downstream answer backbone and replaces per-turn memory-management decoding with an embedding-based routing policy. MemRouter encodes each turn together with recent context, projects the resulting embeddings through a frozen LLM backbone, and predicts whether the turn should be stored using lightweight classification heads while training only 12M parameters. Under a controlled matched-harness comparison on LoCoMo, where the retrieval pipeline, answer prompts, and QA backbone (Qwen2.5-7B) are held identical, MemRouter outperforms an LLM-based memory manager on every question category (overall F1 52.0 vs 45.6, non-overlapping 95% CIs) while reducing memory-management p50 latency from 970ms to 58ms. Descriptive factorial averaging further shows that learned admission improves mean F1 by +10.3 over random storage, category-specific prompting adds +5.2 over a generic prompt, and retrieval contributes +0.7. These results suggest that write-side memory admission can be learned by a small supervised router, while answer generation remains a separate downstream component in long-horizon conversational QA.