Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities

📅 2024-06-12
🏛️ Computer Vision and Pattern Recognition
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of detecting skill-based human activity failures in first-person videos without requiring error annotations. We propose an unsupervised method that leverages gaze deviation—the discrepancy between ground-truth eye-tracking trajectories and vision-driven predictive gaze trajectories—as a failure indicator. Our key contributions include: (i) introducing the novel gaze completion task to explicitly model cross-modal associations between eye movements and local visual tokens; and (ii) defining gaze deviation as a robust failure metric under high uncertainty. The method integrates eye-tracking analysis, self-supervised cross-modal modeling, and attention deviation assessment. Evaluated on EPIC-Tent, HoloAssist, and IndustReal, it achieves relative improvements of +14%, +11%, and +5%, respectively, matching supervised methods in performance and winning the HoloAssist Failure Detection Challenge.

Technology Category

Application Category

📝 Abstract
We address the challenge of unsupervised mistake detection in egocentric video of skilled human activities through the analysis of gaze signals. While traditional methods rely on manually labeled mistakes, our approach does not require mistake annotations, hence overcoming the need of domain-specific labeled data. Based on the observation that eye movements closely follow object manipulation activities, we assess to what extent eye-gaze signals can support mistake detection, proposing to identify deviations in attention patterns measured through a gaze tracker with respect to those estimated by a gaze prediction model. Since predicting gaze in video is characterized by high uncertainty, we propose a novel gaze completion task, where eye fixations are predicted from visual observations and partial gaze trajectories, and contribute a novel gaze completion approach which explicitly models correlations between gaze information and local visual tokens. Inconsistencies between predicted and observed gaze trajectories act as an indicator to identify mistakes. Experiments highlight the effectiveness of the proposed approach in different settings, with relative gains up to +14%, +11%, and +5% in EPIC-Tent, HoloAssist and IndustReal respectively, remarkably matching results of supervised approaches without seeing any labels. We further show that gaze-based analysis is particularly useful in the presence of skilled actions, low action execution confidence, and actions requiring hand-eye coordination and object manipulation skills. Our method is ranked first on the HoloAssist Mistake Detection challenge.
Problem

Research questions and friction points this paper is trying to address.

Detecting mistakes in skilled activities without labeled data
Using eye-gaze deviations to identify attention pattern errors
Predicting gaze trajectories to uncover action inconsistencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses eye-gaze signals for mistake detection
Proposes gaze completion task for prediction
Identifies inconsistencies in gaze trajectories
🔎 Similar Papers
No similar papers found.