LightMem: Lightweight and Efficient Memory-Augmented Generation

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing large language models (LLMs) struggle to efficiently leverage dynamic interaction histories, while mainstream memory systems incur prohibitive computational and temporal overhead. To address this, we propose LightMem—a lightweight, biologically inspired three-tier memory augmentation system: perceptual memory with sensory filtering, topic-aware short-term memory, and long-term memory supporting offline updates. LightMem innovatively couples sensory filtering with topic clustering and introduces a “sleep-like” memory consolidation mechanism that decouples inference from memory updating. Leveraging lightweight compression, dynamic summarization, and offline update techniques, LightMem demonstrates effectiveness across GPT- and Qwen-based models. On the LongMemEval benchmark, it achieves up to a 10.9% accuracy improvement, reduces token consumption by 117×, decreases API calls by 159×, and shortens runtime by over 12×.

Technology Category

Application Category

📝 Abstract

Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and utilization mechanisms. However, existing memory systems often introduce substantial time and computational overhead. To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of human memory, LightMem organizes memory into three complementary stages. First, cognition-inspired sensory memory rapidly filters irrelevant information through lightweight compression and groups information according to their topics. Next, topic-aware short-term memory consolidates these topic-based groups, organizing and summarizing content for more structured access. Finally, long-term memory with sleep-time update employs an offline procedure that decouples consolidation from online inference. Experiments on LongMemEval with GPT and Qwen backbones show that LightMem outperforms strong baselines in accuracy (up to 10.9% gains) while reducing token usage by up to 117x, API calls by up to 159x, and runtime by over 12x. The code is available at https://github.com/zjunlp/LightMem.

Problem

Research questions and friction points this paper is trying to address.

Addresses LLMs' inefficiency in leveraging historical interaction data

Reduces computational overhead in memory-augmented generation systems

Balances performance and efficiency through multi-stage memory organization

Innovation

Methods, ideas, or system contributions that make the work stand out.

LightMem organizes memory into three complementary stages

It uses lightweight compression to filter irrelevant information

Offline update decouples consolidation from online inference

🔎 Similar Papers

No similar papers found.