π€ AI Summary
In 3D human pose estimation, data-driven methods suffer from poor generalization, while optimization-based approaches often encounter depth ambiguity and overfitting due to overly stringent 2D projection constraints. To address these issues, this paper proposes an uncertainty-aware test-time optimization framework. Its core innovation is the first introduction of joint-level uncertainty modeling, which decouples 2D reprojection constraints from 3D structural priors. Specifically, it employs uncertainty-weighted reprojection loss, backbone-freeze implicit state optimization, and adaptive gradient clipping to dynamically modulate optimization strength. This design preserves pretrained model priors while mitigating domain shift. Experiments demonstrate a 4.5% error reduction on Human3.6M and significant improvements in cross-domain generalization on MPI-INF-3DHP. The source code is publicly available.
π Abstract
Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, which only ensures alignment in 2D space, potentially leading to the overfitting problem. To address this, we propose an Uncertainty-Aware testing-time Optimization (UAO) framework, which keeps the prior information of pre-trained model and alleviates the overfitting problem using the uncertainty of joints. Specifically, during the training phase, we design an effective 2D-to-3D network for estimating the corresponding 3D pose while quantifying the uncertainty of each 3D joint. For optimization during testing, the proposed optimization framework freezes the pre-trained model and optimizes only a latent state. Projection loss is then employed to ensure the generated poses are well aligned in 2D space for high-quality optimization. Furthermore, we utilize the uncertainty of each joint to determine how much each joint is allowed for optimization. The effectiveness and superiority of the proposed framework are validated through extensive experiments on two challenging datasets: Human3.6M and MPI-INF-3DHP. Notably, our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M. Our source code will be open-sourced.