PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Estimating 3D human pose from a single RGB image is highly susceptible to error propagation from inaccurate 2D pose estimates and self-occlusion. To address these challenges, this work proposes a unified intermediate representation in a 3D anchor space, leveraging joint-level 3D anchors, a depth-aware feature enhancement mechanism, and an anchor-feature interaction decoder to effectively mitigate error propagation and handle self-occlusion. Furthermore, an ensemble prediction strategy from anchors to joints is introduced, significantly improving reconstruction accuracy. The method consistently outperforms existing approaches on the Human3.6M, MPI-INF-3DHP, and 3DPW benchmarks, achieving a 14.7% reduction in MPJPE on the challenging scenarios of Human3.6M.

Technology Category

Application Category

📝 Abstract
3D human pose lifting from a single RGB image is a challenging task in 3D vision. Existing methods typically establish a direct joint-to-joint mapping from 2D to 3D poses based on 2D features. This formulation suffers from two fundamental limitations: inevitable error propagation from input predicted 2D pose to 3D predictions and inherent difficulties in handling self-occlusion cases. In this paper, we propose PandaPose, a 3D human pose lifting approach via propagating 2D pose prior to 3D anchor space as the unified intermediate representation. Specifically, our 3D anchor space comprises: (1) Joint-wise 3D anchors in the canonical coordinate system, providing accurate and robust priors to mitigate 2D pose estimation inaccuracies. (2) Depth-aware joint-wise feature lifting that hierarchically integrates depth information to resolve self-occlusion ambiguities. (3) The anchor-feature interaction decoder that incorporates 3D anchors with lifted features to generate unified anchor queries encapsulating joint-wise 3D anchor set, visual cues and geometric depth information. The anchor queries are further employed to facilitate anchor-to-joint ensemble prediction. Experiments on three well-established benchmarks (i.e., Human3.6M, MPI-INF-3DHP and 3DPW) demonstrate the superiority of our proposition. The substantial reduction in error by $14.7\%$ compared to SOTA methods on the challenging conditions of Human3.6M and qualitative comparisons further showcase the effectiveness and robustness of our approach.
Problem

Research questions and friction points this paper is trying to address.

3D human pose estimation
single-image 3D pose lifting
2D-to-3D pose mapping
self-occlusion
error propagation
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D human pose estimation
pose lifting
3D anchor space
depth-aware feature lifting
anchor-feature interaction
Jinghong Zheng
Jinghong Zheng
Huazhong University of Science and Technology
Computer VisionPose Estimation
C
Changlong Jiang
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
Yang Xiao
Yang Xiao
PhD student, University of Technology Sydney
3DGSAIGCLLM
Jiaqi Li
Jiaqi Li
Huazhong University of Science and Technology
Computer VisionDepth Estimation
H
Haohong Kuang
School of Journalism and Information Communication, Huazhong University of Science and Technology
Hang Xu
Hang Xu
King Abdullah University of Science and Technology
Deep LearningDistributed System
R
Ran Wang
School of Journalism and Information Communication, Huazhong University of Science and Technology; School of Future Technology, Huazhong University of Science and Technology
Zhiguo Cao
Zhiguo Cao
Huazhong University of Science and Technology
Pattern RecognitionComputer Vision
Min Du
Min Du
NVIDIA
LLMRAGMachine learningSecurity
Joey Tianyi Zhou
Joey Tianyi Zhou
A*STAR and NUS
Efficient AIRobust & Safe AI