Label-Free Long-Horizon 3D UAV Trajectory Prediction via Motion-Aligned RGB and Event Cues

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address airspace and public safety risks posed by consumer-grade drones, existing detection methods lack the capability to predict future 3D trajectories—hindering proactive countermeasures. This paper proposes the first unsupervised visual trajectory prediction framework that requires no manual 3D annotations: it generates pseudo-labels via unsupervised LiDAR-based trajectory extraction and cross-modal (RGB + event-camera) motion alignment, incorporates kinematic priors as constraints, and introduces a vision-oriented Mamba architecture for long-horizon, self-supervised 3D trajectory forecasting. Evaluated on the MMAUD dataset, our method reduces 5-second prediction error by approximately 40% compared to supervised image- and audiovisual-based baselines, while enabling real-time deployment for active drone countermeasures.

Technology Category

Application Category

📝 Abstract
The widespread use of consumer drones has introduced serious challenges for airspace security and public safety. Their high agility and unpredictable motion make drones difficult to track and intercept. While existing methods focus on detecting current positions, many counter-drone strategies rely on forecasting future trajectories and thus require more than reactive detection to be effective. To address this critical gap, we propose an unsupervised vision-based method for predicting the three-dimensional trajectories of drones. Our approach first uses an unsupervised technique to extract drone trajectories from raw LiDAR point clouds, then aligns these trajectories with camera images through motion consistency to generate reliable pseudo-labels. We then combine kinematic estimation with a visual Mamba neural network in a self-supervised manner to predict future drone trajectories. We evaluate our method on the challenging MMAUD dataset, including the V2 sequences that feature wide-field-of-view multimodal sensors and dynamic UAV motion in urban scenes. Extensive experiments show that our framework outperforms supervised image-only and audio-visual baselines in long-horizon trajectory prediction, reducing 5-second 3D error by around 40 percent without using any manual 3D labels. The proposed system offers a cost-effective, scalable alternative for real-time counter-drone deployment. All code will be released upon acceptance to support reproducible research in the robotics community.
Problem

Research questions and friction points this paper is trying to address.

Predicts 3D drone trajectories without manual labels
Combines LiDAR and camera data for trajectory alignment
Improves long-horizon UAV motion forecasting accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised trajectory extraction from LiDAR
Motion-aligned RGB and event cues
Self-supervised visual Mamba network
🔎 Similar Papers
No similar papers found.
H
Hanfang Liang
Jianghan University, Wuhan, China.
S
Shenghai Yuan
Nanyang Technological University, Singapore.
F
Fen Liu
Nanyang Technological University, Singapore.
Y
Yizhuo Yang
Nanyang Technological University, Singapore.
B
Bing Wang
Jianghan University, Wuhan, China.
Z
Zhuyu Huang
Beihang University (BUAA), Beijing, China.
Chenyang Shi
Chenyang Shi
Beihang University
Neuromorphic computing
J
Jing Jin
Beihang University (BUAA), Beijing, China.