Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the online detection of procedural and executional errors in first-person videos—specifically, procedural errors (e.g., step misordering) and executional errors (e.g., motion inaccuracies or tool misuse). We propose the first end-to-end online detection-feedback closed-loop framework that unifies modeling of both error types. Our method integrates temporal action recognition, sliding-window online inference, multimodal feature alignment, and leverages large language models to generate interpretable natural-language feedback. Unlike prior approaches targeting only one error category, ours enables fine-grained, real-time, and explainable joint detection and intervention for both error classes. Evaluated on the HoloAssist benchmark, our framework ranks second in the error detection task, demonstrating robustness and practical utility in real-world industrial and educational settings.

Technology Category

Application Category

📝 Abstract
In this report, we address the task of online mistake detection, which is vital in domains like industrial automation and education, where real-time video analysis allows human operators to correct errors as they occur. While previous work focuses on procedural errors involving action order, broader error types must be addressed for real-world use. We introduce an online mistake detection framework that handles both procedural and execution errors (e.g., motor slips or tool misuse). Upon detecting an error, we use a large language model (LLM) to generate explanatory feedback. Experiments on the HoloAssist benchmark confirm the effectiveness of our approach, where our approach is placed second on the mistake detection task.
Problem

Research questions and friction points this paper is trying to address.

Detects procedural and execution errors in real-time
Generates explanatory feedback using large language models
Validates effectiveness on HoloAssist benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online mistake detection framework for errors
Combines procedural and execution error handling
LLM generates explanatory feedback upon detection
🔎 Similar Papers
No similar papers found.
C
Constantin Patsch
Technical University of Munich
M
Marsil Zakour
Technical University of Munich
Y
Yuankai Wu
Technical University of Munich
Eckehard Steinbach
Eckehard Steinbach
Professor for Media Technology
Media TechnologyImage and Video CompressionMultimedia CommunicationHaptic CommunicationVisual Localization