MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of a realistic, user-centered benchmark for evaluating long-context memory in large language models (LLMs) across diverse domains over extended timeframes. To this end, we introduce MemoryCD, the first large-scale memory evaluation benchmark grounded in real-world, multi-year, cross-domain user interactions derived from the Amazon Review dataset. We construct a multi-dimensional evaluation pipeline to assess 14 prominent LLMs and 6 representative memory mechanisms across four personalized task types spanning 12 distinct domains. Our results reveal that current memory approaches exhibit substantial performance deficiencies in authentic cross-domain scenarios. MemoryCD establishes the first standardized, user-centric platform for advancing research on lifelong, cross-domain personalization in LLMs.
📝 Abstract
Recent advancements in Large Language Models (LLMs) have expanded context windows to million-token scales, yet benchmarks for evaluating memory remain limited to short-session synthetic dialogues. We introduce \textsc{MemoryCD}, the first large-scale, user-centric, cross-domain memory benchmark derived from lifelong real-world behaviors in the Amazon Review dataset. Unlike existing memory datasets that rely on scripted personas to generate synthetic user data, \textsc{MemoryCD} tracks authentic user interactions across years and multiple domains. We construct a multi-faceted long-context memory evaluation pipeline of 14 state-of-the-art LLM base models with 6 memory method baselines on 4 distinct personalization tasks over 12 diverse domains to evaluate an agent's ability to simulate real user behaviors in both single and cross-domain settings. Our analysis reveals that existing memory methods are far from user satisfaction in various domains, offering the first testbed for cross-domain life-long personalization evaluation.
Problem

Research questions and friction points this paper is trying to address.

long-context memory
lifelong personalization
cross-domain
LLM agents
user memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

long-context memory
cross-domain personalization
lifelong user modeling
LLM agent evaluation
real-world user behavior
🔎 Similar Papers