🤖 AI Summary
This work addresses the challenge of missing skeletal joints in real-world scenarios due to occlusion, which severely degrades human trajectory prediction performance. To mitigate this issue, we propose a self-supervised skeleton representation learning approach based on masked autoencoding. By pretraining the model to effectively capture the intrinsic relationships between skeletal structure and motion dynamics, our method seamlessly integrates into existing trajectory prediction frameworks. Without compromising prediction accuracy, the proposed approach significantly enhances robustness against mild to moderate joint occlusions. Extensive experiments demonstrate that our model consistently outperforms current baselines under occluded conditions, achieving both high accuracy and strong generalization capability.
📝 Abstract
Human trajectory prediction plays a crucial role in applications such as autonomous navigation and video surveillance. While recent works have explored the integration of human skeleton sequences to complement trajectory information, skeleton data in real-world environments often suffer from missing joints caused by occlusions. These disturbances significantly degrade prediction accuracy, indicating the need for more robust skeleton representations. We propose a robust trajectory prediction method that incorporates a self-supervised skeleton representation model pretrained with masked autoencoding. Experimental results in occlusion-prone scenarios show that our method improves robustness to missing skeletal data without sacrificing prediction accuracy, and consistently outperforms baseline models in clean-to-moderate missingness regimes.