HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

📅 2026-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of memory management in large language models during long conversations, where balancing computational efficiency and task performance remains challenging. The authors propose a hybrid memory architecture that integrates dual-granularity memory storage with a dynamic two-level retrieval mechanism. Inspired by cognitive economy principles, their approach employs an on-demand scheduling strategy that coordinates lightweight summarization with an LLM-driven deep reasoning module, further enhanced by reflection-augmented iterative inference. This design overcomes the limitations of monolithic architectures and static retrieval schemes, achieving state-of-the-art performance on the LOCOMO and LongMemEval benchmarks while reducing computational overhead by 92.6% compared to full-context baselines, thereby establishing a new trade-off between efficiency and effectiveness in long-term memory management.

Technology Category

Application Category

📝 Abstract
Large language model (LLM) agents demonstrate strong performance in short-text contexts but often underperform in extended dialogues due to inefficient memory management. Existing approaches face a fundamental trade-off between efficiency and effectiveness: memory compression risks losing critical details required for complex reasoning, while retaining raw text introduces unnecessary computational overhead for simple queries. The crux lies in the limitations of monolithic memory representations and static retrieval mechanisms, which fail to emulate the flexible and proactive memory scheduling capabilities observed in humans, thus struggling to adapt to diverse problem scenarios. Inspired by the principle of cognitive economy, we propose HyMem, a hybrid memory architecture that enables dynamic on-demand scheduling through multi-granular memory representations. HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. Experiments show that HyMem achieves strong performance on both the LOCOMO and LongMemEval benchmarks, outperforming full-context while reducing computational cost by 92.6\%, establishing a state-of-the-art balance between efficiency and performance in long-term memory management.
Problem

Research questions and friction points this paper is trying to address.

memory management
large language models
long-term dialogue
retrieval efficiency
cognitive economy
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid memory architecture
dynamic retrieval scheduling
multi-granular memory
cognitive economy
LLM agents
🔎 Similar Papers
No similar papers found.
Xiaochen Zhao
Xiaochen Zhao
Research Scientist, ByteDance
computer science
K
Kaikai Wang
Ant Group
X
Xiaowen Zhang
Ant Group
C
Chen Yao
Ant Group
A
Aili Wang
ZJU-UIUC Institute, Zhejiang University