🤖 AI Summary
This work addresses the scarcity of temporally synchronized, finely annotated first-person (ego) and third-person (exo) multi-view human activity data in real-world industrial settings—a key bottleneck for advancing intelligent assistance and safety systems. To bridge this gap, we introduce ENIGMA-360, a novel dataset captured in authentic industrial environments, comprising 180 time-synchronized ego-exo procedural video pairs with fine-grained spatiotemporal annotations. This is the first large-scale effort to achieve synchronized acquisition and detailed labeling in such complex real-world scenarios. We further define three benchmark tasks: temporal action segmentation, key step recognition, and egocentric human-object interaction detection. Baseline experiments reveal the limited performance of current methods, underscoring the need for robust ego-exo fusion models. The dataset and annotations are publicly released to foster community research.
📝 Abstract
Understanding human behavior from complementary egocentric (ego) and exocentric (exo) points of view enables the development of systems that can support workers in industrial environments and enhance their safety. However, progress in this area is hindered by the lack of datasets capturing both views in realistic industrial scenarios. To address this gap, we propose ENIGMA-360, a new ego-exo dataset acquired in a real industrial scenario. The dataset is composed of 180 egocentric and 180 exocentric procedural videos temporally synchronized offering complementary information of the same scene. The 360 videos have been labeled with temporal and spatial annotations, enabling the study of different aspects of human behavior in industrial domain. We provide baseline experiments for 3 foundational tasks for human behavior understanding: 1) Temporal Action Segmentation, 2) Keystep Recognition and 3) Egocentric Human-Object Interaction Detection, showing the limits of state-of-the-art approaches on this challenging scenario. These results highlight the need for new models capable of robust ego-exo understanding in real-world environments. We publicly release the dataset and its annotations at https://iplab.dmi.unict.it/ENIGMA-360.