Motion Aware ViT-based Framework for Monocular 6-DoF Spacecraft Pose Estimation

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitation of single-frame modeling in monocular 6-DoF spacecraft pose estimation—which neglects temporal dynamics—this paper proposes a multi-frame keypoint localization method integrating motion-aware heatmaps and optical flow. Specifically, a pre-trained optical flow model extracts pixel-level motion cues to construct motion-aware heatmaps; these are fused with robust image features extracted by a Vision Transformer to achieve high-accuracy 2D keypoint regression. The 6-DoF pose is then recovered via a Perspective-n-Point (PnP) solver. The core contribution lies in explicitly modeling temporal dynamics inherent in spatial operations, thereby enhancing robustness against rapid pose changes and occlusions. Evaluated on the SPADES-RGB and SPARK-2024 datasets, the method significantly outperforms single-frame baselines: 2D keypoint localization error decreases by 18.7%, and 6-DoF pose estimation accuracy improves by 22.3%. Moreover, it demonstrates strong generalization across both real and synthetic data domains.

Technology Category

Application Category

📝 Abstract
Monocular 6-DoF pose estimation plays an important role in multiple spacecraft missions. Most existing pose estimation approaches rely on single images with static keypoint localisation, failing to exploit valuable temporal information inherent to space operations. In this work, we adapt a deep learning framework from human pose estimation to the spacecraft pose estimation domain that integrates motion-aware heatmaps and optical flow to capture motion dynamics. Our approach combines image features from a Vision Transformer (ViT) encoder with motion cues from a pre-trained optical flow model to localise 2D keypoints. Using the estimates, a Perspective-n-Point (PnP) solver recovers 6-DoF poses from known 2D-3D correspondences. We train and evaluate our method on the SPADES-RGB dataset and further assess its generalisation on real and synthetic data from the SPARK-2024 dataset. Overall, our approach demonstrates improved performance over single-image baselines in both 2D keypoint localisation and 6-DoF pose estimation. Furthermore, it shows promising generalisation capabilities when testing on different data distributions.
Problem

Research questions and friction points this paper is trying to address.

Monocular 6-DoF pose estimation for spacecraft missions
Exploiting temporal motion information in space operations
Improving 2D keypoint localization and pose accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion-aware heatmaps and optical flow integration
Vision Transformer encoder with optical flow fusion
PnP solver for 6-DoF pose recovery
J
Jose Sosa
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg
D
Dan Pineau
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg
Arunkumar Rathinam
Arunkumar Rathinam
Research Scientist @ SnT, University of Luxembourg
Spacecraft navigationDeep learning
A
Abdelrahman Shabayek
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg
Djamila Aouada
Djamila Aouada
Senior Research Scientist, Interdisciplinary Centre for Security, Reliability, and Trust (SnT
Image ProcessingComputer VisionMachine LearningArtificial Intelligence