EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the challenges in egocentric 3D hand pose estimation and gesture recognition, where conventional frame-based cameras suffer from motion blur and limited dynamic range, while existing event-based approaches are hindered by ego-motion interference, monocular depth ambiguity, and a lack of real-world stereo event data. To overcome these limitations, the authors propose EgoEV-HandPose, an end-to-end framework built upon EgoEVHands—the first large-scale real-world stereo event-based hand dataset. The method introduces a KeypointBEV module that leverages bird’s-eye-view (BEV) feature representations and an iterative reprojection mechanism to jointly refine depth uncertainty and enforce kinematic consistency. Experiments demonstrate that the approach achieves a mean per-joint position error (MPJPE) of 30.54 mm and a Top-1 gesture recognition accuracy of 86.87%, significantly outperforming current RGB stereo and event-based methods under low-light conditions and severe hand occlusion.
📝 Abstract
Egocentric 3D hand pose estimation and gesture recognition are essential for immersive augmented/virtual reality, human-computer interaction, and robotics. However, conventional frame-based cameras suffer from motion blur and limited dynamic range, while existing event-based methods are hindered by ego-motion interference, monocular depth ambiguity, and the lack of large-scale real-world stereo datasets. To overcome these limitations, we propose EgoEV-HandPose, an end-to-end framework for joint 3D bimanual pose estimation and gesture recognition from stereo event streams. Central to our approach is KeypointBEV, a flexible stereo fusion module that lifts features into a canonical bird's-eye-view space and employs an iterative reprojection-guided refinement loop to progressively resolve depth uncertainty and enforce kinematic consistency. In addition, we introduce EgoEVHands, the first large-scale real-world stereo event-camera dataset for egocentric hand perception, containing 5,419 annotated sequences with dense 3D/2D keypoints across 38 gesture classes under varying illumination. Extensive experiments demonstrate that EgoEV-HandPose achieves state-of-the-art performance with an MPJPE of 30.54mm and 86.87% Top-1 gesture recognition accuracy, significantly outperforming RGB-based stereo and prior event-camera methods, particularly in low-light and bimanual occlusion scenarios, thereby setting a new benchmark for event-based egocentric perception. The established dataset and source code will be publicly released at https://github.com/ZJUWang01/EgoEV-HandPose.
Problem

Research questions and friction points this paper is trying to address.

Egocentric
3D Hand Pose Estimation
Gesture Recognition
Stereo Event Cameras
Depth Ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

stereo event cameras
egocentric hand pose estimation
bird's-eye-view (BEV) representation
iterative reprojection refinement
large-scale event dataset
🔎 Similar Papers
No similar papers found.