Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos

📅 2025-08-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-object tracking (MOT) in drone-captured videos suffers from unstable data association and identity ambiguity due to severe viewpoint changes and complex inter-object motion. To address this, we propose AMOT—a joint appearance-motion modeling framework. Unlike conventional methods that treat appearance and motion cues independently, AMOT introduces an appearance-motion consistency matrix and a motion-aware trajectory continuation module, enabling bidirectional spatiotemporal coordination: appearance features guide spatial consistency modeling, while Kalman prediction is fused with a motion-aware reactivation mechanism to significantly improve trajectory continuity and ID stability. AMOT requires no additional training and is plug-and-play. It achieves state-of-the-art performance on three major UAV benchmarks—VisDrone2019, UAVDT, and VT-MOT-UAV—demonstrating strong generalization and practical deployability.

Technology Category

Application Category

📝 Abstract
Multi-object tracking (MOT) aims to track multiple objects while maintaining consistent identities across frames of a given video. In unmanned aerial vehicle (UAV) recorded videos, frequent viewpoint changes and complex UAV-ground relative motion dynamics pose significant challenges, which often lead to unstable affinity measurement and ambiguous association. Existing methods typically model motion and appearance cues separately, overlooking their spatio-temporal interplay and resulting in suboptimal tracking performance. In this work, we propose AMOT, which jointly exploits appearance and motion cues through two key components: an Appearance-Motion Consistency (AMC) matrix and a Motion-aware Track Continuation (MTC) module. Specifically, the AMC matrix computes bi-directional spatial consistency under the guidance of appearance features, enabling more reliable and context-aware identity association. The MTC module complements AMC by reactivating unmatched tracks through appearance-guided predictions that align with Kalman-based predictions, thereby reducing broken trajectories caused by missed detections. Extensive experiments on three UAV benchmarks, including VisDrone2019, UAVDT, and VT-MOT-UAV, demonstrate that our AMOT outperforms current state-of-the-art methods and generalizes well in a plug-and-play and training-free manner.
Problem

Research questions and friction points this paper is trying to address.

Address unstable affinity in UAV videos due to viewpoint changes
Improve motion-appearance integration for better identity association
Reduce broken trajectories from missed detections in MOT
Innovation

Methods, ideas, or system contributions that make the work stand out.

Appearance-Motion Consistency matrix for reliable association
Motion-aware Track Continuation module reducing broken trajectories
Jointly exploits appearance and motion cues
🔎 Similar Papers
No similar papers found.
J
Jianbo Ma
Institute of Optics and Electronics, Chinese Academy of Sciences
H
Hui Luo
Institute of Optics and Electronics, Chinese Academy of Sciences
Q
Qi Chen
University of Adelaide
Yuankai Qi
Yuankai Qi
Assistant Professor, Macquarie University
Vision-Language NavigationSpeech SynthesisVisual TrackingCrowd Counting
Y
Yumei Sun
Institute of Optics and Electronics, Chinese Academy of Sciences
Amin Beheshti
Amin Beheshti
Full Professor, School of Computing, Macquarie University, Sydney, Australia
Applied AIData ScienceBig Data AnalyticsSoftware/Data EngineeringService/Social Computing
J
Jianlin Zhang
Institute of Optics and Electronics, Chinese Academy of Sciences
Ming-Hsuan Yang
Ming-Hsuan Yang
University of California at Merced; Google DeepMind
Computer VisionMachine LearningArtificial Intelligence