Semantics-aware Test-time Adaptation for 3D Human Pose Estimation

📅 2025-02-15

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing test-time adaptation (TTA) methods for 3D human pose estimation suffer from semantic misalignment, leading to over-smoothed predictions and failure of guidance under occlusion or truncation. Method: This paper proposes a semantics-aware motion prior modeling framework—the first to integrate semantic-driven motion priors into TTA. It constructs a joint motion–text embedding space and achieves cross-modal semantic alignment via contrastive learning. Additionally, it introduces a semantics-guided 2D pose completion mechanism for missing joints, enabling robust 2D→3D pose inference. Results: On 3DPW and 3DHP, our method reduces PA-MPJPE by over 12% compared to state-of-the-art TTA approaches, demonstrating substantial improvements in out-of-distribution generalization and occlusion robustness. The results validate the critical role of semantic guidance in enhancing both domain adaptability and resilience to partial observability.

Technology Category

Application Category

📝 Abstract

This work highlights a semantics misalignment in 3D human pose estimation. For the task of test-time adaptation, the misalignment manifests as overly smoothed and unguided predictions. The smoothing settles predictions towards some average pose. Furthermore, when there are occlusions or truncations, the adaptation becomes fully unguided. To this end, we pioneer the integration of a semantics-aware motion prior for the test-time adaptation of 3D pose estimation. We leverage video understanding and a well-structured motion-text space to adapt the model motion prediction to adhere to video semantics during test time. Additionally, we incorporate a missing 2D pose completion based on the motion-text similarity. The pose completion strengthens the motion prior's guidance for occlusions and truncations. Our method significantly improves state-of-the-art 3D human pose estimation TTA techniques, with more than 12% decrease in PA-MPJPE on 3DPW and 3DHP.

Problem

Research questions and friction points this paper is trying to address.

semantics misalignment in 3D pose

overly smoothed pose predictions

unguided adaptation during occlusions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantics-aware motion prior integration

Video understanding for model adaptation

2D pose completion using motion-text similarity

🔎 Similar Papers

Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency