Unsupervised 3D Human Pose Estimation via Conditional Multi-view Ancestral Sampling

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work proposes an unsupervised method for single-view 2D-to-3D human pose estimation that operates without any ground-truth 3D annotations. By introducing conditional multi-view ancestral sampling (cMAS), it pioneers the application of multi-view diffusion sampling to this task, leveraging the noise-space priors of a pretrained 2D motion diffusion model (MDM) to refine 3D poses such that their multi-view projections align with the 2D diffusion manifold while remaining consistent with the input 2D pose and anatomical constraints. Evaluated on the Yoga dataset, the method significantly outperforms both supervised and unsupervised state-of-the-art approaches, demonstrating exceptional cross-domain generalization—particularly in extreme and challenging poses.
📝 Abstract
We propose a method of estimating a 3D human pose from a single view without 3D supervision. The key to our method is to leverage the 2D diffusion priors of motion diffusion models (MDMs) pre-trained on large 2D human pose datasets. Specifically, we extend multi-view ancestral sampling of diffusion models to the task of 2D-3D lifting of human pose. To this end, we newly propose a conditional multi-view ancestral sampling (cMAS) that optimizes the 3D pose such that its multi-view projections follow the manifold in 2D MDM noise space, while conditioning the 3D pose to match the given 2D poses and anatomical constraints of humans. Experiments on the Yoga dataset demonstrate that our method achieves better cross-domain performance compared to state-of-the-art supervised and unsupervised 3D pose estimation methods, including extreme human poses where 3D supervision is unavailable. Code is available at: https://github.com/asaa0001/c-MAS.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised 3D Human Pose Estimation
Single-view 3D Pose
3D Pose Lifting
Cross-domain Pose Estimation
No 3D Supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional multi-view ancestral sampling
unsupervised 3D human pose estimation
motion diffusion models
2D-3D pose lifting
diffusion priors
🔎 Similar Papers
2024-07-04Image and Vision ComputingCitations: 3