SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing satellite video tracking methods suffer from poor generalization, reliance on scene-specific training, and frequent target loss under occlusion. To address these limitations, we propose a training-free zero-shot tracking framework—the first to integrate the promptable vision foundation model SAM2 into satellite video tracking—augmented with Kalman filter-based motion constraints and a state-machine mechanism to effectively suppress drift and enhance temporal consistency. To enable large-scale evaluation, we introduce MVOT, the first synthetic satellite video tracking dataset. Experiments demonstrate state-of-the-art performance: our method achieves a 5.84% AUC gain on the OOTB benchmark, significantly outperforming both conventional trackers and existing foundation-model-based approaches. These results validate the efficacy and robustness of the zero-shot paradigm in complex remote sensing scenarios.

Technology Category

Application Category

📝 Abstract
Existing satellite video tracking methods often struggle with generalization, requiring scenario-specific training to achieve satisfactory performance, and are prone to track loss in the presence of occlusion. To address these challenges, we propose SatSAM2, a zero-shot satellite video tracker built on SAM2, designed to adapt foundation models to the remote sensing domain. SatSAM2 introduces two core modules: a Kalman Filter-based Constrained Motion Module (KFCMM) to exploit temporal motion cues and suppress drift, and a Motion-Constrained State Machine (MCSM) to regulate tracking states based on motion dynamics and reliability. To support large-scale evaluation, we propose MatrixCity Video Object Tracking (MVOT), a synthetic benchmark containing 1,500+ sequences and 157K annotated frames with diverse viewpoints, illumination, and occlusion conditions. Extensive experiments on two satellite tracking benchmarks and MVOT show that SatSAM2 outperforms both traditional and foundation model-based trackers, including SAM2 and its variants. Notably, on the OOTB dataset, SatSAM2 achieves a 5.84% AUC improvement over state-of-the-art methods. Our code and dataset will be publicly released to encourage further research.
Problem

Research questions and friction points this paper is trying to address.

Addresses poor generalization in satellite video tracking methods
Solves track loss during occlusion in satellite imagery
Eliminates need for scenario-specific training in satellite tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SAM2 foundation model for zero-shot tracking
Integrates Kalman Filter for motion-constrained object tracking
Implements state machine to regulate tracking reliability
🔎 Similar Papers
No similar papers found.
R
Ruijie Fan
School of Geospatial Engineering and Science, Sun Yat-Sen University, Zhuhai, China
Junyan Ye
Junyan Ye
SYSU
Computer Vision and Deep Learning
Huan Chen
Huan Chen
Shunfeng Technology Company Limited
Artificial IntelligenceFormal Methods
Zilong Huang
Zilong Huang
ByteDance Inc.
Multi-modal LearningComputer Vision
X
Xiaolei Wang
School of Geospatial Engineering and Science, Sun Yat-Sen University, Zhuhai, China
W
Weijia Li
School of Geospatial Engineering and Science, Sun Yat-Sen University, Zhuhai, China