🤖 AI Summary
Existing RGB/depth-based 3D hand mesh reconstruction fails under low-light conditions and high-speed motion, while current event-camera approaches are constrained by static backgrounds and fixed camera viewpoints. To address these limitations, this paper proposes the first egocentric, event-driven hand reconstruction framework capable of handling dynamic backgrounds and arbitrary camera motion. Methodologically, we design a lightweight hand segmentation module to suppress background noise and integrate event stream encoding, temporal feature modeling, and differentiable mesh regression into an end-to-end trainable architecture. Evaluated on the N-HOT3D dataset, our method achieves a mean per-joint position error (MPJPE) of 5.9 cm—reducing the error by 4.5 cm (43%) over the best prior event-based method—and enables real-time, robust 3D hand reconstruction for the first time in complex, dynamically changing scenes.
📝 Abstract
Reconstructing 3D hand mesh is challenging but an important task for human-computer interaction and AR/VR applications. In particular, RGB and/or depth cameras have been widely used in this task. However, methods using these conventional cameras face challenges in low-light environments and during motion blur. Thus, to address these limitations, event cameras have been attracting attention in recent years for their high dynamic range and high temporal resolution. Despite their advantages, event cameras are sensitive to background noise or camera motion, which has limited existing studies to static backgrounds and fixed cameras. In this study, we propose EventEgoHands, a novel method for event-based 3D hand mesh reconstruction in an egocentric view. Our approach introduces a Hand Segmentation Module that extracts hand regions, effectively mitigating the influence of dynamic background events. We evaluated our approach and demonstrated its effectiveness on the N-HOT3D dataset, improving MPJPE by approximately more than 4.5 cm (43%).