🤖 AI Summary
This work addresses the challenge of first-person human-object interaction (HOI) detection in industrial settings, where limited annotated data hinders the development of robust models. To overcome this, the authors propose a framework that integrates synthetic data generation with diffusion model-based augmentation to produce high-quality images featuring realistic personal protective equipment (PPE). They introduce GlovEgo-HOI, the first industrial-scale first-person HOI benchmark dataset, and present GlovEgo-Net, a multi-head network that jointly performs glove recognition and hand keypoint detection to enhance interaction understanding. Experimental results demonstrate significant improvements in detection accuracy. The released dataset, data augmentation pipeline, and pre-trained models are expected to advance research in this domain.
📝 Abstract
Egocentric Human-Object Interaction (EHOI) analysis is crucial for industrial safety, yet the development of robust models is hindered by the scarcity of annotated domain-specific data. We address this challenge by introducing a data generation framework that combines synthetic data with a diffusion-based process to augment real-world images with realistic Personal Protective Equipment (PPE). We present GlovEgo-HOI, a new benchmark dataset for industrial EHOI, and GlovEgo-Net, a model integrating Glove-Head and Keypoint- Head modules to leverage hand pose information for enhanced interaction detection. Extensive experiments demonstrate the effectiveness of the proposed data generation framework and GlovEgo-Net. To foster further research, we release the GlovEgo-HOI dataset, augmentation pipeline, and pre-trained models at: GitHub project.