Lightweight LLM Agent Memory with Small Language Models

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of large language model (LLM) agents during prolonged interactions, often caused by unstable memory systems or accumulated latency. The authors propose LightMem, a novel framework that introduces a small language model to construct a lightweight, modular memory system partitioned into short-, medium-, and long-term components, with decoupled online retrieval and offline consolidation pipelines. Key innovations include a two-stage online retrieval mechanism combining vector-based coarse search with semantic re-ranking, user-aware memory isolation via user identifiers, and an incremental strategy for long-term memory integration. Evaluated on the LoCoMo benchmark, LightMem achieves an average F1 score improvement of 2.5, with a median retrieval latency of only 83 ms and end-to-end latency of 581 ms, effectively balancing efficacy and efficiency.
📝 Abstract
Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construction and candidate filtering. In contrast, many systems use repeated large-model calls for online memory operations, improving accuracy but accumulating latency over long interactions. We propose LightMem, a lightweight memory system for better agent memory driven by Small Language Models (SLMs). LightMem modularizes memory retrieval, writing, and long-term consolidation, and separates online processing from offline consolidation to enable efficient memory invocation under bounded compute. We organize memory into short-term memory (STM) for immediate conversational context, mid-term memory (MTM) for reusable interaction summaries, and long-term memory (LTM) for consolidated knowledge, and uses user identifiers to support independent retrieval and incremental maintenance in multi-user settings. Online, LightMem operates under a fixed retrieval budget and selects memories via a two-stage procedure: vector-based coarse retrieval followed by semantic consistency re-ranking. Offline, it abstracts reusable interaction evidence and incrementally integrates it into LTM. Experiments show gains across model scales, with an average F1 improvement of about 2.5 on LoCoMo, more effective and low median latency (83 ms retrieval; 581 ms end-to-end).
Problem

Research questions and friction points this paper is trying to address.

LLM agent memory
retrieval-based memory
online latency
cross-turn consistency
long-horizon interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

LightMem
Small Language Models
Agent Memory
Memory Modularization
Two-stage Retrieval
J
Jiaquan Zhang
School of Computer Science and Engineering, University of Electronic Science and Technology of China
Chaoning Zhang
Chaoning Zhang
Professor at UESTC (电子科技大学, China)
Computer VisionLLM and VLMGenAI and AIGC Detection
S
Shuxu Chen
Department of Electronics and Information Convergence Engineering, Kyung Hee University
Z
Zhenzhen Huang
School of Computer Science and Engineering, University of Electronic Science and Technology of China
P
Pengcheng Zheng
School of Computer Science and Engineering, University of Electronic Science and Technology of China
Z
Zhicheng Wang
School of Computer Science and Engineering, University of Electronic Science and Technology of China
Ping Guo
Ping Guo
City University of Hong Kong
Computational IntelligenceArtificial IntelligenceOptimization
Fan Mo
Fan Mo
University of Oxford; University of Cambridge;
AI4Sciencedigital twingraph neural networkintelligent manufacturingrobotics
S
Sung-Ho Bae
School of Computing, Kyung Hee University
Jie Zou
Jie Zou
University of Electronic Science and Technology of China
Information RetrievalNatural Language ProcessingRecommender SystemsMultimedia
Jiwei Wei
Jiwei Wei
Professor at University of Electronic Science and Technology of China (UESTC)
Cross-Modal RetrievalMetric LearningAdversarial Machine LearningAIGC
Y
Yang Yang
School of Computer Science and Engineering, University of Electronic Science and Technology of China