TeleEgo: Benchmarking Egocentric AI Assistants in the Wild

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing benchmarks predominantly focus on short-term, single-modal tasks, failing to adequately evaluate first-person AI assistants’ integrated capabilities—multimodal input fusion, real-time responsiveness, and long-term memory retention—in realistic streaming scenarios. Method: We introduce the first streaming, full-modality benchmark for extended daily tasks, spanning work/study, routine life, social interaction, and cultural travel. We propose two novel metrics—Real-Time Accuracy and Memory Persistence Time—and design 12 diagnostic subtasks to jointly assess memory retention, cross-temporal understanding, and reasoning under a unified temporal framework. Input streams include synchronized first-person video, audio, and text, augmented with human-refined visual descriptions and ASR transcripts. Contribution/Results: We release a high-quality dataset comprising 3,291 QA pairs (averaging >14 hours per participant), enabling reproducible, fine-grained evaluation of embodied AI assistants in practical, temporally grounded settings.

Technology Category

Application Category

📝 Abstract

Egocentric AI assistants in real-world settings must process multi-modal inputs (video, audio, text), respond in real time, and retain evolving long-term memory. However, existing benchmarks typically evaluate these abilities in isolation, lack realistic streaming scenarios, or support only short-term tasks. We introduce extbf{TeleEgo}, a long-duration, streaming, omni-modal benchmark for evaluating egocentric AI assistants in realistic daily contexts. The dataset features over 14 hours per participant of synchronized egocentric video, audio, and text across four domains: work & study, lifestyle & routines, social activities, and outings & culture. All data is aligned on a unified global timeline and includes high-quality visual narrations and speech transcripts, curated through human refinement.TeleEgo defines 12 diagnostic subtasks across three core capabilities: Memory (recalling past events), Understanding (interpreting the current moment), and Cross-Memory Reasoning (linking distant events). It contains 3,291 human-verified QA items spanning multiple question formats (single-choice, binary, multi-choice, and open-ended), evaluated strictly in a streaming setting. We propose two key metrics -- Real-Time Accuracy and Memory Persistence Time -- to jointly assess correctness, temporal responsiveness, and long-term retention. TeleEgo provides a realistic and comprehensive evaluation to advance the development of practical AI assistants.

Problem

Research questions and friction points this paper is trying to address.

Evaluating egocentric AI assistants in realistic daily streaming scenarios

Assessing multi-modal memory, understanding, and cross-memory reasoning capabilities

Measuring real-time accuracy and long-term memory persistence in assistants

Innovation

Methods, ideas, or system contributions that make the work stand out.

TeleEgo benchmark tests egocentric AI assistants

It uses synchronized multi-modal real-world data streams

Measures real-time accuracy and long-term memory persistence

🔎 Similar Papers

No similar papers found.