Motion Aware ViT-based Framework for Monocular 6-DoF Spacecraft Pose Estimation

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the limitation of single-frame modeling in monocular 6-DoF spacecraft pose estimation—which neglects temporal dynamics—this paper proposes a multi-frame keypoint localization method integrating motion-aware heatmaps and optical flow. Specifically, a pre-trained optical flow model extracts pixel-level motion cues to construct motion-aware heatmaps; these are fused with robust image features extracted by a Vision Transformer to achieve high-accuracy 2D keypoint regression. The 6-DoF pose is then recovered via a Perspective-n-Point (PnP) solver. The core contribution lies in explicitly modeling temporal dynamics inherent in spatial operations, thereby enhancing robustness against rapid pose changes and occlusions. Evaluated on the SPADES-RGB and SPARK-2024 datasets, the method significantly outperforms single-frame baselines: 2D keypoint localization error decreases by 18.7%, and 6-DoF pose estimation accuracy improves by 22.3%. Moreover, it demonstrates strong generalization across both real and synthetic data domains.

Technology Category

Application Category

📝 Abstract

Monocular 6-DoF pose estimation plays an important role in multiple spacecraft missions. Most existing pose estimation approaches rely on single images with static keypoint localisation, failing to exploit valuable temporal information inherent to space operations. In this work, we adapt a deep learning framework from human pose estimation to the spacecraft pose estimation domain that integrates motion-aware heatmaps and optical flow to capture motion dynamics. Our approach combines image features from a Vision Transformer (ViT) encoder with motion cues from a pre-trained optical flow model to localise 2D keypoints. Using the estimates, a Perspective-n-Point (PnP) solver recovers 6-DoF poses from known 2D-3D correspondences. We train and evaluate our method on the SPADES-RGB dataset and further assess its generalisation on real and synthetic data from the SPARK-2024 dataset. Overall, our approach demonstrates improved performance over single-image baselines in both 2D keypoint localisation and 6-DoF pose estimation. Furthermore, it shows promising generalisation capabilities when testing on different data distributions.

Problem

Research questions and friction points this paper is trying to address.

Monocular 6-DoF pose estimation for spacecraft missions

Exploiting temporal motion information in space operations

Improving 2D keypoint localization and pose accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion-aware heatmaps and optical flow integration

Vision Transformer encoder with optical flow fusion

PnP solver for 6-DoF pose recovery

🔎 Similar Papers

Domain Generalization for In-Orbit 6D Pose Estimation