DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

πŸ“… 2025-08-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the scarcity of high-quality 3D full-body pose data and the challenge of modeling inter-joint dependencies, this paper proposes DPoser-Xβ€”the first general-purpose 3D full-body pose prior framework based on diffusion models. It unifies inverse problems including pose estimation, completion, and generation, introducing truncated timestep scheduling and mask-based training to effectively integrate heterogeneous pose data from the full body, hands, and face, while explicitly capturing cross-part structural dependencies. Leveraging variational diffusion sampling, DPoser-X robustly reconstructs complete 3D poses from sparse, occluded, or partial observations. Extensive evaluations on AMASS, CAPE, and H36M demonstrate significant improvements over state-of-the-art methods, validating its strong generalization capability and cross-scenario applicability.

Technology Category

Application Category

πŸ“ Abstract
We present DPoser-X, a diffusion-based prior model for 3D whole-body human poses. Building a versatile and robust full-body human pose prior remains challenging due to the inherent complexity of articulated human poses and the scarcity of high-quality whole-body pose datasets. To address these limitations, we introduce a Diffusion model as body Pose prior (DPoser) and extend it to DPoser-X for expressive whole-body human pose modeling. Our approach unifies various pose-centric tasks as inverse problems, solving them through variational diffusion sampling. To enhance performance on downstream applications, we introduce a novel truncated timestep scheduling method specifically designed for pose data characteristics. We also propose a masked training mechanism that effectively combines whole-body and part-specific datasets, enabling our model to capture interdependencies between body parts while avoiding overfitting to specific actions. Extensive experiments demonstrate DPoser-X's robustness and versatility across multiple benchmarks for body, hand, face, and full-body pose modeling. Our model consistently outperforms state-of-the-art alternatives, establishing a new benchmark for whole-body human pose prior modeling.
Problem

Research questions and friction points this paper is trying to address.

Modeling 3D whole-body human poses robustly
Addressing scarcity of high-quality pose datasets
Unifying pose tasks via diffusion-based inverse problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion model for 3D whole-body pose prior
Truncated timestep scheduling for pose data
Masked training combining whole-body datasets
Junzhe Lu
Junzhe Lu
Tsinghua University
computer visiongenerative modeling
J
Jing Lin
Nanyang Technological University
H
Hongkun Dou
Beihang University
Ailing Zeng
Ailing Zeng
Anuttacon
Deep LearningComputer VisionVirtual HumansVideo Generation
Y
Yue Deng
Beihang University
X
Xian Liu
NVIDIA Research
Z
Zhongang Cai
SenseTime Research
L
Lei Yang
SenseTime Research
Y
Yulun Zhang
Shanghai Jiao Tong University
H
Haoqian Wang
Tsinghua University
Ziwei Liu
Ziwei Liu
Associate Professor, Nanyang Technological University
Computer VisionMachine LearningComputer Graphics