π€ AI Summary
This work addresses the challenges of detection failure, trajectory fragmentation, and identity switches in multi-UAV tracking caused by complex nonlinear swarm motion and weak visual cues. To this end, the authors propose the SCT-MOT framework, which uniquely models motion dependencies among UAVs at the swarm level. The framework introduces a Swarm Motion-Aware Trajectory Prediction (SMTP) module to capture coupled motion dynamics and a Trajectory-Guided Spatio-Temporal Feature Fusion (TG-STFF) module that deeply integrates trajectory predictions with multi-frame appearance features to enhance spatio-temporal consistency. Evaluated on the AIRMOT, MOT-FLY, and UAVSwarm datasets, SCT-MOT significantly outperforms existing methods, with SMTP alone improving IDF1 by 1.21% over EqMotion, demonstrating superior tracking accuracy and robustness.
π Abstract
Air-to-air tracking of swarm UAVs presents significant challenges due to the complex nonlinear group motion and weak visual cues for small objects, which often cause detection failures, trajectory fragmentation, and identity switches. Although existing methods have attempted to improve performance by incorporating trajectory prediction, they model each object independently, neglecting the swarm-level motion dependencies. Their limited integration between motion prediction and appearance representation also weakens the spatio-temporal consistency required for tracking in visually ambiguous and cluttered environments, making it difficult to maintain coherent trajectories and reliable associations. To address these challenges, we propose SCT-MOT, a tracking framework that integrates Swarm-Coupled motion modeling and Trajectory-guided feature fusion. First, we develop a Swarm Motion-Aware Trajectory Prediction (SMTP) module jointly models historical trajectories and posture-aware appearance features from a swarm-level perspective, enabling more accurate forecasting of the nonlinear, coupled group trajectories. Second, we design a Trajectory-Guided Spatio-Temporal Feature Fusion (TG-STFF) module aligns predicted positions with historical visual cues and deeply integrates them with current frame features, enhancing temporal consistency and spatial discriminability for weak objects. Extensive experiments on three public air-to-air swarm UAV tracking datasets, including AIRMOT, MOT-FLY, and UAVSwarm, demonstrate that SMTP achieves more accurate trajectory forecasts and yields a 1.21\% IDF1 improvement over the state-of-the-art trajectory prediction module EqMotion when integrated into the same MOT framework. Overall, our SCT-MOT consistently achieves superior accuracy and robustness compared to state-of-the-art trackers across multiple metrics under complex swarm scenarios.