EHPE: A Segmented Architecture for Enhanced Hand Pose Estimation

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D hand pose estimation methods neglect the critical localization roles of fingertips (TIP) and the wrist, and struggle to suppress error accumulation at distal joints, leading to pose inaccuracies and reconstruction artifacts. To address this, we propose EHPE, a segmented enhancement framework for hand pose estimation. First, we design a TIP-and-wrist-prioritized extraction module to mitigate forward error propagation. Second, we introduce a dual-branch interactive network that jointly fuses local feature representations and anatomical prior guidance for joint-level optimization. EHPE operates on single RGB monocular input and significantly improves distal joint localization accuracy. Evaluated on two mainstream benchmarks—FreiHAND and STB—it achieves state-of-the-art performance, reducing mean joint error by 12.3% and markedly enhancing hand mesh reconstruction quality. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
3D hand pose estimation has garnered great attention in recent years due to its critical applications in human-computer interaction, virtual reality, and related fields. The accurate estimation of hand joints is essential for high-quality hand pose estimation. However, existing methods neglect the importance of Distal Phalanx Tip (TIP) and Wrist in predicting hand joints overall and often fail to account for the phenomenon of error accumulation for distal joints in gesture estimation, which can cause certain joints to incur larger errors, resulting in misalignments and artifacts in the pose estimation and degrading the overall reconstruction quality. To address this challenge, we propose a novel segmented architecture for enhanced hand pose estimation (EHPE). We perform local extraction of TIP and wrist, thus alleviating the effect of error accumulation on TIP prediction and further reduce the predictive errors for all joints on this basis. EHPE consists of two key stages: In the TIP and Wrist Joints Extraction stage (TW-stage), the positions of the TIP and wrist joints are estimated to provide an initial accurate joint configuration; In the Prior Guided Joints Estimation stage (PG-stage), a dual-branch interaction network is employed to refine the positions of the remaining joints. Extensive experiments on two widely used benchmarks demonstrate that EHPE achieves state-of-the-arts performance. Code is available at https://github.com/SereinNout/EHPE.
Problem

Research questions and friction points this paper is trying to address.

Improves accuracy of 3D hand pose estimation by focusing on distal phalanx tips and wrist joints
Addresses error accumulation in distal joints to reduce misalignments in pose estimation
Proposes a segmented architecture with dual-stage processing for enhanced joint prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Segmented architecture for hand pose estimation
Local extraction of TIP and wrist joints
Dual-branch interaction network for joint refinement
🔎 Similar Papers
No similar papers found.
M
Mingen Xu
Hangzhou Dianzi University, Hangzhou, Zhejiang, China
Bolun Zheng
Bolun Zheng
Hangzhou Dianzi Universiy
multimediacomputer vision
Xinjie Liu
Xinjie Liu
PhD student, University of Texas at Austin
artificial intelligencereinforcement learninggame theoryoptimizationrobotics
Q
Qianyu Zhang
Hangzhou Dianzi University, Hangzhou, Zhejiang, China
C
Canjin Wang
Xinhua Zhiyun Technology Co., Ltd., Hangzhou, Zhejiang, China
F
Fangni Chen
Xinhua Zhiyun Technology Co., Ltd., Hangzhou, Zhejiang, China