LifeEval: A Multimodal Benchmark for Assistive AI in Egocentric Daily Life Tasks

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video benchmarks primarily focus on passive understanding and are ill-suited for evaluating the ability of multimodal large language models to provide real-time, interactive assistance for everyday tasks in dynamic real-world environments. To address this gap, this work proposes the first task-centric evaluation framework for real-time human-AI collaboration, grounded in continuous first-person video streams and natural dialogue. The authors construct a high-quality benchmark dataset encompassing six core capability dimensions, comprising 4,075 rigorously annotated samples. Systematic evaluation across 26 state-of-the-art models reveals significant deficiencies in timeliness, effectiveness, and interactive adaptability, thereby establishing a foundational benchmark for research on human-centered interactive intelligence in authentic everyday scenarios.

Technology Category

Application Category

📝 Abstract
The rapid progress of Multimodal Large Language Models (MLLMs) marks a significant step toward artificial general intelligence, offering great potential for augmenting human capabilities. However, their ability to provide effective assistance in dynamic, real-world environments remains largely underexplored. Existing video benchmarks predominantly assess passive understanding through retrospective analysis or isolated perception tasks, failing to capture the interactive and adaptive nature of real-time user assistance. To bridge this gap, we introduce LifeEval, a multimodal benchmark designed to evaluate real-time, task-oriented human-AI collaboration in daily life from an egocentric perspective. LifeEval emphasizes three key aspects: task-oriented holistic evaluation, egocentric real-time perception from continuous first-person streams, and human-assistant collaborative interaction through natural dialogues. Constructed via a rigorous annotation pipeline, the benchmark comprises 4,075 high-quality question-answer pairs across 6 core capability dimensions. Extensive evaluations of 26 state-of-the-art MLLMs on LifeEval reveal substantial challenges in achieving timely, effective and adaptive interaction, highlighting essential directions for advancing human-centered interactive intelligence.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
egocentric perception
real-time assistance
human-AI collaboration
interactive intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark
egocentric vision
real-time assistance
human-AI collaboration
task-oriented evaluation
🔎 Similar Papers
No similar papers found.
H
Hengjian Gao
Shanghai Artificial Intelligence Laboratory, Shanghai Jiao Tong University
K
Kaiwei Zhang
Shanghai Artificial Intelligence Laboratory
S
Shibo Wang
Shanghai Jiao Tong University
Mingjie Chen
Mingjie Chen
KU Leuven
isogeny-based cryptographyalgorithmic number theory
Q
Qihang Cao
Shanghai University of Electric Power
X
Xianfeng Wang
Shanghai Jiao Tong University
Yucheng Zhu
Yucheng Zhu
Shanghai Jiaotong University
Multimedia Signal Processing
X
Xiongkuo Min
Shanghai Jiao Tong University
W
Wei Sun
East China Normal University
D
Dandan Zhu
East China Normal University
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays