KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

πŸ“… 2026-01-08
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing long-term memory benchmarks predominantly rely on multi-turn dialogues or synthetic user histories, which inadequately capture a model’s capacity for deep human understanding. This work proposes the first public benchmark grounded in long-form autobiographical narratives, integrating dense evidence from behaviors, contextual details, and internal mental states to construct a temporally anchored, flashback-aware evaluation pipeline. The benchmark features question-answering tasks that require cross-temporal evidence integration, moving beyond conventional reliance on retrieval accuracy alone. It introduces novel evaluation mechanisms centered on narrative reconstruction, evidence linking, and retrieval-augmented reasoning. Experimental results demonstrate that while current retrieval-augmented systems perform well on factual recall, they exhibit significant limitations in temporal reasoning and higher-order attribution of psychological states.

Technology Category

Application Category

πŸ“ Abstract
Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}.
Problem

Research questions and friction points this paper is trying to address.

person understanding
lifelong digital companions
memory benchmarks
autobiographical narratives
principle-level reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

person understanding
lifelong digital companions
autobiographical narratives
evidence-linked reasoning
memory benchmark
πŸ”Ž Similar Papers
No similar papers found.