Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the online detection of procedural and executional errors in first-person videos—specifically, procedural errors (e.g., step misordering) and executional errors (e.g., motion inaccuracies or tool misuse). We propose the first end-to-end online detection-feedback closed-loop framework that unifies modeling of both error types. Our method integrates temporal action recognition, sliding-window online inference, multimodal feature alignment, and leverages large language models to generate interpretable natural-language feedback. Unlike prior approaches targeting only one error category, ours enables fine-grained, real-time, and explainable joint detection and intervention for both error classes. Evaluated on the HoloAssist benchmark, our framework ranks second in the error detection task, demonstrating robustness and practical utility in real-world industrial and educational settings.

Technology Category

Application Category

📝 Abstract

In this report, we address the task of online mistake detection, which is vital in domains like industrial automation and education, where real-time video analysis allows human operators to correct errors as they occur. While previous work focuses on procedural errors involving action order, broader error types must be addressed for real-world use. We introduce an online mistake detection framework that handles both procedural and execution errors (e.g., motor slips or tool misuse). Upon detecting an error, we use a large language model (LLM) to generate explanatory feedback. Experiments on the HoloAssist benchmark confirm the effectiveness of our approach, where our approach is placed second on the mistake detection task.

Problem

Research questions and friction points this paper is trying to address.

Detects procedural and execution errors in real-time

Generates explanatory feedback using large language models

Validates effectiveness on HoloAssist benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online mistake detection framework for errors

Combines procedural and execution error handling

LLM generates explanatory feedback upon detection

🔎 Similar Papers

Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities

2024-06-12Computer Vision and Pattern RecognitionCitations: 2

Field AI

Irvine, CA

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)