🤖 AI Summary
Pretraining以人为中心 models on RGB images suffers from poor data scalability and severe annotation noise due to the absence of depth information. Method: This paper proposes a depth-agnostic frequency-domain semantic learning framework. Its core innovation is a novel dual-path annotation purification mechanism that jointly leverages DCT feature maps and keypoint constraints, integrated with keypoint-guided contrastive learning, multi-scale denoising auxiliary tasks, and self-supervised pose consistency regularization—enabling robust representation learning under weak supervision. Contribution/Results: The method achieves significant performance gains on downstream tasks across benchmarks including PoseTrack and LaRa. Notably, even under 30% label noise, the pretrained model retains over 92% of its original accuracy, demonstrating exceptional robustness and generalization capability.