RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the high token cost incurred by existing agent memory systems, which invoke large language models (LLMs) for memory retrieval at every interaction, thereby hindering long-term deployment. To mitigate this, the authors propose a memory consolidation mechanism grounded in interaction repetitiveness: LLM-based memory extraction and summarization are triggered only when semantically similar interactions recur persistently. The approach integrates a lightweight embedding model, semantic clustering, and a refinement module to construct a subconscious memory layer that preserves fine-grained factual details. Empirical results demonstrate that this method substantially reduces computational overhead—achieving up to an 87% reduction in token consumption during memory construction—while simultaneously improving memory accuracy, outperforming three state-of-the-art memory systems.

📝 Abstract

Memory systems often organize user-agent interactions as retrievable external memory and are crucial for long-running agents by overcoming the limited context windows of LLMs. However, existing memory systems invoke LLMs to process every incoming interaction for memory extraction, and such an eager memory consolidation scheme leads to substantial token consumption. To tackle this problem, we propose RecMem by rethinking when memory consolidation should be conducted. RecMem stores incoming interactions in a subconscious memory layer and encode them using lightweight embedding models for retrieval. LLMs are only invoked to extract episodic and semantic memory when sustained recurrence are observed for semantically similar interactions. Such recurrence-based consolidation works because these interactions correspond to a semantic cluster with rich information and thus are worth extraction and summarization. To improve accuracy, RecMem also incorporates a semantic refinement mechanism that recovers the fine-grained facts omitted by memory extraction. Experiments show that RecMem reduces the memory construction token cost of three SOTA memory systems by up to 87% while exceeding their accuracy.

Problem

Research questions and friction points this paper is trying to address.

memory consolidation

long-running LLM agents

token consumption

external memory

context window limitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

recurrence-based memory consolidation

subconscious memory layer

lightweight embedding