LightMem: Lightweight and Efficient Memory-Augmented Generation

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) struggle to efficiently leverage dynamic interaction histories, while mainstream memory systems incur prohibitive computational and temporal overhead. To address this, we propose LightMem—a lightweight, biologically inspired three-tier memory augmentation system: perceptual memory with sensory filtering, topic-aware short-term memory, and long-term memory supporting offline updates. LightMem innovatively couples sensory filtering with topic clustering and introduces a “sleep-like” memory consolidation mechanism that decouples inference from memory updating. Leveraging lightweight compression, dynamic summarization, and offline update techniques, LightMem demonstrates effectiveness across GPT- and Qwen-based models. On the LongMemEval benchmark, it achieves up to a 10.9% accuracy improvement, reduces token consumption by 117×, decreases API calls by 159×, and shortens runtime by over 12×.

Technology Category

Application Category

📝 Abstract
Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and utilization mechanisms. However, existing memory systems often introduce substantial time and computational overhead. To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of human memory, LightMem organizes memory into three complementary stages. First, cognition-inspired sensory memory rapidly filters irrelevant information through lightweight compression and groups information according to their topics. Next, topic-aware short-term memory consolidates these topic-based groups, organizing and summarizing content for more structured access. Finally, long-term memory with sleep-time update employs an offline procedure that decouples consolidation from online inference. Experiments on LongMemEval with GPT and Qwen backbones show that LightMem outperforms strong baselines in accuracy (up to 10.9% gains) while reducing token usage by up to 117x, API calls by up to 159x, and runtime by over 12x. The code is available at https://github.com/zjunlp/LightMem.
Problem

Research questions and friction points this paper is trying to address.

Addresses LLMs' inefficiency in leveraging historical interaction data
Reduces computational overhead in memory-augmented generation systems
Balances performance and efficiency through multi-stage memory organization
Innovation

Methods, ideas, or system contributions that make the work stand out.

LightMem organizes memory into three complementary stages
It uses lightweight compression to filter irrelevant information
Offline update decouples consolidation from online inference
🔎 Similar Papers
No similar papers found.