🤖 AI Summary
To address the limitation of single-frame modeling in monocular 6-DoF spacecraft pose estimation—which neglects temporal dynamics—this paper proposes a multi-frame keypoint localization method integrating motion-aware heatmaps and optical flow. Specifically, a pre-trained optical flow model extracts pixel-level motion cues to construct motion-aware heatmaps; these are fused with robust image features extracted by a Vision Transformer to achieve high-accuracy 2D keypoint regression. The 6-DoF pose is then recovered via a Perspective-n-Point (PnP) solver. The core contribution lies in explicitly modeling temporal dynamics inherent in spatial operations, thereby enhancing robustness against rapid pose changes and occlusions. Evaluated on the SPADES-RGB and SPARK-2024 datasets, the method significantly outperforms single-frame baselines: 2D keypoint localization error decreases by 18.7%, and 6-DoF pose estimation accuracy improves by 22.3%. Moreover, it demonstrates strong generalization across both real and synthetic data domains.
📝 Abstract
Monocular 6-DoF pose estimation plays an important role in multiple spacecraft missions. Most existing pose estimation approaches rely on single images with static keypoint localisation, failing to exploit valuable temporal information inherent to space operations. In this work, we adapt a deep learning framework from human pose estimation to the spacecraft pose estimation domain that integrates motion-aware heatmaps and optical flow to capture motion dynamics. Our approach combines image features from a Vision Transformer (ViT) encoder with motion cues from a pre-trained optical flow model to localise 2D keypoints. Using the estimates, a Perspective-n-Point (PnP) solver recovers 6-DoF poses from known 2D-3D correspondences. We train and evaluate our method on the SPADES-RGB dataset and further assess its generalisation on real and synthetic data from the SPARK-2024 dataset. Overall, our approach demonstrates improved performance over single-image baselines in both 2D keypoint localisation and 6-DoF pose estimation. Furthermore, it shows promising generalisation capabilities when testing on different data distributions.