SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing satellite video tracking methods suffer from poor generalization, reliance on scene-specific training, and frequent target loss under occlusion. To address these limitations, we propose a training-free zero-shot tracking framework—the first to integrate the promptable vision foundation model SAM2 into satellite video tracking—augmented with Kalman filter-based motion constraints and a state-machine mechanism to effectively suppress drift and enhance temporal consistency. To enable large-scale evaluation, we introduce MVOT, the first synthetic satellite video tracking dataset. Experiments demonstrate state-of-the-art performance: our method achieves a 5.84% AUC gain on the OOTB benchmark, significantly outperforming both conventional trackers and existing foundation-model-based approaches. These results validate the efficacy and robustness of the zero-shot paradigm in complex remote sensing scenarios.

Technology Category

Application Category

📝 Abstract

Existing satellite video tracking methods often struggle with generalization, requiring scenario-specific training to achieve satisfactory performance, and are prone to track loss in the presence of occlusion. To address these challenges, we propose SatSAM2, a zero-shot satellite video tracker built on SAM2, designed to adapt foundation models to the remote sensing domain. SatSAM2 introduces two core modules: a Kalman Filter-based Constrained Motion Module (KFCMM) to exploit temporal motion cues and suppress drift, and a Motion-Constrained State Machine (MCSM) to regulate tracking states based on motion dynamics and reliability. To support large-scale evaluation, we propose MatrixCity Video Object Tracking (MVOT), a synthetic benchmark containing 1,500+ sequences and 157K annotated frames with diverse viewpoints, illumination, and occlusion conditions. Extensive experiments on two satellite tracking benchmarks and MVOT show that SatSAM2 outperforms both traditional and foundation model-based trackers, including SAM2 and its variants. Notably, on the OOTB dataset, SatSAM2 achieves a 5.84% AUC improvement over state-of-the-art methods. Our code and dataset will be publicly released to encourage further research.

Problem

Research questions and friction points this paper is trying to address.

Addresses poor generalization in satellite video tracking methods

Solves track loss during occlusion in satellite imagery

Eliminates need for scenario-specific training in satellite tracking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SAM2 foundation model for zero-shot tracking

Integrates Kalman Filter for motion-constrained object tracking

Implements state machine to regulate tracking reliability

🔎 Similar Papers

Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework