🤖 AI Summary
To address the limited generalizability of 3D human motion prediction (HMP) models—stemming from their reliance on costly, domain-specific motion-capture (MoCap) data—this paper proposes a test-time domain adaptation framework that requires no additional MoCap annotations. Our method leverages monocular video-derived 2D pose estimates and reconstructs MoCap-style pseudo-3D motion sequences via a lightweight 3D motion reconstruction pipeline. We further introduce the first test-domain-aware lightweight adaptation paradigm, enabling end-to-end fine-tuning of pre-trained HMP models directly on target-domain inputs. This approach significantly enhances model robustness to unseen actions and cross-subject scenarios: it achieves a 12.3% reduction in MPJPE across multiple benchmarks. Qualitative evaluations confirm its superior generalization capability on complex, dynamic motions.
📝 Abstract
In 3D Human Motion Prediction (HMP), conventional methods train HMP models with expensive motion capture data. However, the data collection cost of such motion capture data limits the data diversity, which leads to poor generalizability to unseen motions or subjects. To address this issue, this paper proposes to enhance HMP with additional learning using estimated poses from easily available videos. The 2D poses estimated from the monocular videos are carefully transformed into motion capture-style 3D motions through our pipeline. By additional learning with the obtained motions, the HMP model is adapted to the test domain. The experimental results demonstrate the quantitative and qualitative impact of our method.