🤖 AI Summary
This work addresses human pose estimation from forward-facing, head-mounted event cameras under dynamic conditions—a challenging scenario where conventional RGB-based methods fail due to low illumination and high-speed motion. Method: We formally define and tackle the novel task of “forward-facing first-person egocentric pose estimation.” Our approach introduces a Motion Segmentation module that leverages event streams for dynamic object segmentation to suppress background clutter, coupled with a head-pose-conditioned modeling mechanism to guide accurate full-body pose generation. Contribution/Results: We construct EgoEvent, the first synthetic dynamic event dataset tailored to this task, built upon EgoBody. Evaluated on a custom dynamic event test set, our method achieves significant improvements over baselines across four key metrics—demonstrating superior robustness and state-of-the-art performance in complex, real-world dynamic environments.
📝 Abstract
Estimating human pose using a front-facing egocentric camera is essential for applications such as sports motion analysis, VR/AR, and AI for wearable devices. However, many existing methods rely on RGB cameras and do not account for low-light environments or motion blur. Event-based cameras have the potential to address these challenges. In this work, we introduce a novel task of human pose estimation using a front-facing event-based camera mounted on the head and propose D-EventEgo, the first framework for this task. The proposed method first estimates the head poses, and then these are used as conditions to generate body poses. However, when estimating head poses, the presence of dynamic objects mixed with background events may reduce head pose estimation accuracy. Therefore, we introduce the Motion Segmentation Module to remove dynamic objects and extract background information. Extensive experiments on our synthetic event-based dataset derived from EgoBody, demonstrate that our approach outperforms our baseline in four out of five evaluation metrics in dynamic environments.