RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

📅 2026-01-11
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing memory evaluation benchmarks for large language models, which are predominantly confined to static dialogue scenarios and fail to assess memory capabilities in long-term, goal-evolving project-based interactions. To bridge this gap, we propose RealMem—the first memory-driven interactive benchmark grounded in realistic project contexts—encompassing over 2,000 cross-session dialogues across 11 task categories and introducing a novel evaluation paradigm tailored to long-term project evolution. We develop a dynamic memory evolution simulation system by integrating project construction, multi-agent dialogue generation, and memory scheduling. Experimental results reveal that current models exhibit significant deficiencies in tracking long-term project states and dynamic contextual dependencies, thereby offering clear guidance for future memory architecture design.

Technology Category

Application Category

📝 Abstract
As Large Language Models (LLMs) evolve from static dialogue interfaces to autonomous general agents, effective memory is paramount to ensuring long-term consistency. However, existing benchmarks primarily focus on casual conversation or task-oriented dialogue, failing to capture **"long-term project-oriented"** interactions where agents must track evolving goals. To bridge this gap, we introduce **RealMem**, the first benchmark grounded in realistic project scenarios. RealMem comprises over 2,000 cross-session dialogues across eleven scenarios, utilizing natural user queries for evaluation. We propose a synthesis pipeline that integrates Project Foundation Construction, Multi-Agent Dialogue Generation, and Memory and Schedule Management to simulate the dynamic evolution of memory. Experiments reveal that current memory systems face significant challenges in managing the long-term project states and dynamic context dependencies inherent in real-world projects. Our code and datasets are available at [https://github.com/AvatarMemory/RealMemBench](https://github.com/AvatarMemory/RealMemBench).
Problem

Research questions and friction points this paper is trying to address.

long-term memory
project-oriented interaction
memory-driven dialogue
context consistency
real-world scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

RealMem
long-term memory
project-oriented interaction
multi-agent dialogue generation
memory benchmark
🔎 Similar Papers
No similar papers found.
H
Haonan Bian
Xidian University
Zhiyuan Yao
Zhiyuan Yao
Ph.D. in Financial Engineering, Stevens Institute of Technology
Reinforcement LearningMachine LearningML/RL in Financial Trading
S
Sen Hu
Peking University
Zishan Xu
Zishan Xu
Tsinghua University
Shaolei Zhang
Shaolei Zhang
Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS)
Natural Language ProcessingLarge Language ModelMultimodal LLMsSimultaneous Translation
Y
Yifu Guo
Sun Yat-sen University
Z
Ziliang Yang
Xidian University
X
Xueran Han
Renmin University of China
H
Huacan Wang
University of the Chinese Academy of Sciences
R
Ronghao Chen
Peking University