PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Estimating 3D human pose from a single RGB image is highly susceptible to error propagation from inaccurate 2D pose estimates and self-occlusion. To address these challenges, this work proposes a unified intermediate representation in a 3D anchor space, leveraging joint-level 3D anchors, a depth-aware feature enhancement mechanism, and an anchor-feature interaction decoder to effectively mitigate error propagation and handle self-occlusion. Furthermore, an ensemble prediction strategy from anchors to joints is introduced, significantly improving reconstruction accuracy. The method consistently outperforms existing approaches on the Human3.6M, MPI-INF-3DHP, and 3DPW benchmarks, achieving a 14.7% reduction in MPJPE on the challenging scenarios of Human3.6M.

Technology Category

Application Category

📝 Abstract

3D human pose lifting from a single RGB image is a challenging task in 3D vision. Existing methods typically establish a direct joint-to-joint mapping from 2D to 3D poses based on 2D features. This formulation suffers from two fundamental limitations: inevitable error propagation from input predicted 2D pose to 3D predictions and inherent difficulties in handling self-occlusion cases. In this paper, we propose PandaPose, a 3D human pose lifting approach via propagating 2D pose prior to 3D anchor space as the unified intermediate representation. Specifically, our 3D anchor space comprises: (1) Joint-wise 3D anchors in the canonical coordinate system, providing accurate and robust priors to mitigate 2D pose estimation inaccuracies. (2) Depth-aware joint-wise feature lifting that hierarchically integrates depth information to resolve self-occlusion ambiguities. (3) The anchor-feature interaction decoder that incorporates 3D anchors with lifted features to generate unified anchor queries encapsulating joint-wise 3D anchor set, visual cues and geometric depth information. The anchor queries are further employed to facilitate anchor-to-joint ensemble prediction. Experiments on three well-established benchmarks (i.e., Human3.6M, MPI-INF-3DHP and 3DPW) demonstrate the superiority of our proposition. The substantial reduction in error by $14.7\%$ compared to SOTA methods on the challenging conditions of Human3.6M and qualitative comparisons further showcase the effectiveness and robustness of our approach.

Problem

Research questions and friction points this paper is trying to address.

3D human pose estimation

single-image 3D pose lifting

2D-to-3D pose mapping

self-occlusion

error propagation

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D human pose estimation

pose lifting

3D anchor space