RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
This work addresses the high token cost incurred by existing agent memory systems, which invoke large language models (LLMs) for memory retrieval at every interaction, thereby hindering long-term deployment. To mitigate this, the authors propose a memory consolidation mechanism grounded in interaction repetitiveness: LLM-based memory extraction and summarization are triggered only when semantically similar interactions recur persistently. The approach integrates a lightweight embedding model, semantic clustering, and a refinement module to construct a subconscious memory layer that preserves fine-grained factual details. Empirical results demonstrate that this method substantially reduces computational overhead—achieving up to an 87% reduction in token consumption during memory construction—while simultaneously improving memory accuracy, outperforming three state-of-the-art memory systems.
📝 Abstract
Memory systems often organize user-agent interactions as retrievable external memory and are crucial for long-running agents by overcoming the limited context windows of LLMs. However, existing memory systems invoke LLMs to process every incoming interaction for memory extraction, and such an eager memory consolidation scheme leads to substantial token consumption. To tackle this problem, we propose RecMem by rethinking when memory consolidation should be conducted. RecMem stores incoming interactions in a subconscious memory layer and encode them using lightweight embedding models for retrieval. LLMs are only invoked to extract episodic and semantic memory when sustained recurrence are observed for semantically similar interactions. Such recurrence-based consolidation works because these interactions correspond to a semantic cluster with rich information and thus are worth extraction and summarization. To improve accuracy, RecMem also incorporates a semantic refinement mechanism that recovers the fine-grained facts omitted by memory extraction. Experiments show that RecMem reduces the memory construction token cost of three SOTA memory systems by up to 87% while exceeding their accuracy.
Problem

Research questions and friction points this paper is trying to address.

memory consolidation
long-running LLM agents
token consumption
external memory
context window limitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

recurrence-based memory consolidation
subconscious memory layer
lightweight embedding
semantic refinement
long-running LLM agents
Z
Zijie Dai
Department of Computer Science and Engineering, The Chinese University of Hong Kong
S
Shiyuan Deng
Huawei Cloud
Sheng Guan
Sheng Guan
Beijing University of Posts and Telecommunications
Large Language Models
Y
Yizhou Tian
Department of Computer Science and Engineering, The Chinese University of Hong Kong
X
Xin Yao
Huawei Theory Lab
Xiao Yan
Xiao Yan
Wuhan University
Systems for Data Processing
J
James Cheng
Department of Computer Science and Engineering, The Chinese University of Hong Kong