KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Existing long-term memory benchmarks predominantly rely on multi-turn dialogues or synthetic user histories, which inadequately capture a model’s capacity for deep human understanding. This work proposes the first public benchmark grounded in long-form autobiographical narratives, integrating dense evidence from behaviors, contextual details, and internal mental states to construct a temporally anchored, flashback-aware evaluation pipeline. The benchmark features question-answering tasks that require cross-temporal evidence integration, moving beyond conventional reliance on retrieval accuracy alone. It introduces novel evaluation mechanisms centered on narrative reconstruction, evidence linking, and retrieval-augmented reasoning. Experimental results demonstrate that while current retrieval-augmented systems perform well on factual recall, they exhibit significant limitations in temporal reasoning and higher-order attribution of psychological states.

Technology Category

Application Category

📝 Abstract

Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}.

Problem

Research questions and friction points this paper is trying to address.

person understanding

lifelong digital companions

memory benchmarks

autobiographical narratives

principle-level reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

person understanding

lifelong digital companions

autobiographical narratives