🤖 AI Summary
To address airspace and public safety risks posed by consumer-grade drones, existing detection methods lack the capability to predict future 3D trajectories—hindering proactive countermeasures. This paper proposes the first unsupervised visual trajectory prediction framework that requires no manual 3D annotations: it generates pseudo-labels via unsupervised LiDAR-based trajectory extraction and cross-modal (RGB + event-camera) motion alignment, incorporates kinematic priors as constraints, and introduces a vision-oriented Mamba architecture for long-horizon, self-supervised 3D trajectory forecasting. Evaluated on the MMAUD dataset, our method reduces 5-second prediction error by approximately 40% compared to supervised image- and audiovisual-based baselines, while enabling real-time deployment for active drone countermeasures.
📝 Abstract
The widespread use of consumer drones has introduced serious challenges for airspace security and public safety. Their high agility and unpredictable motion make drones difficult to track and intercept. While existing methods focus on detecting current positions, many counter-drone strategies rely on forecasting future trajectories and thus require more than reactive detection to be effective. To address this critical gap, we propose an unsupervised vision-based method for predicting the three-dimensional trajectories of drones. Our approach first uses an unsupervised technique to extract drone trajectories from raw LiDAR point clouds, then aligns these trajectories with camera images through motion consistency to generate reliable pseudo-labels. We then combine kinematic estimation with a visual Mamba neural network in a self-supervised manner to predict future drone trajectories. We evaluate our method on the challenging MMAUD dataset, including the V2 sequences that feature wide-field-of-view multimodal sensors and dynamic UAV motion in urban scenes. Extensive experiments show that our framework outperforms supervised image-only and audio-visual baselines in long-horizon trajectory prediction, reducing 5-second 3D error by around 40 percent without using any manual 3D labels. The proposed system offers a cost-effective, scalable alternative for real-time counter-drone deployment. All code will be released upon acceptance to support reproducible research in the robotics community.