π€ AI Summary
This work addresses the absence of benchmark datasets capable of evaluating complex user history reasoning in advanced multimodal personalization research. To this end, we introduce Life-Bench, the first synthetic multimodal benchmark that encompasses both personality understanding and complex historical reasoning. We further propose LifeGraph, an end-to-end knowledge graph framework that structures personal digital footprints to support retrieval and reasoning. Experimental results demonstrate that existing methods exhibit limited performance on relational, temporal, and aggregative reasoning tasks, whereas LifeGraph substantially narrows the performance gap. These findings validate the efficacy of a knowledge graphβdriven, structured approach to personalized modeling and underscore the significant research potential remaining in this direction.
π Abstract
The powerful reasoning of modern Vision Language Models open a new frontier for advanced personalization study. However, progress in this area is critically hampered by the lack of suitable benchmarks. To address this gap, we introduce Life-Bench, a comprehensive, synthetically generated multimodal benchmark built on simulated user digital footprints. Life-Bench features over questions evaluating a wide spectrum of capabilities, from persona understanding to complex reasoning over historical data. These capabilities expand far beyond prior benchmarks, reflecting the critical demands essential for real-world applications. Furthermore, we propose LifeGraph, an end-to-end framework that organizes personal context into a knowledge graph to facilitate structured retrieval and reasoning. Our experiments on Life-Bench reveal that existing methods falter significantly on complex personalized tasks, exposing a large performance headroom, especially in relational, temporal and aggregative reasoning. While LifeGraph closes this gap by leveraging structured knowledge and demonstrates a promising direction, these advanced personalization tasks remain a critical open challenge, motivating new research in this area.