A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the poor robustness of single-modal visual tracking in complex environments, this paper introduces MM-UAV—the first large-scale, trimodal unmanned aerial vehicle (UAV) multi-object tracking benchmark—incorporating synchronized RGB, infrared, and event-camera modalities. It comprises 1,321 sequences, over 2.8 million annotated frames, and 30+ challenging scenarios. Methodologically, we propose an offset-guided adaptive sensor alignment module to correct spatial misalignment across modalities, a dynamic multimodal fusion mechanism to weight modality-specific features contextually, and an event-driven identity association strategy to enhance motion cue modeling. Our end-to-end framework achieves significant performance gains over existing state-of-the-art methods on MM-UAV. To foster reproducible research, we fully open-source the dataset, code, and trained models—establishing a unified evaluation benchmark and technical foundation for multimodal UAV tracking.

Technology Category

Application Category

📝 Abstract

With the proliferation of low altitude unmanned aerial vehicles (UAVs), visual multi-object tracking is becoming a critical security technology, demanding significant robustness even in complex environmental conditions. However, tracking UAVs using a single visual modality often fails in challenging scenarios, such as low illumination, cluttered backgrounds, and rapid motion. Although multi-modal multi-object UAV tracking is more resilient, the development of effective solutions has been hindered by the absence of dedicated public datasets. To bridge this gap, we release MM-UAV, the first large-scale benchmark for Multi-Modal UAV Tracking, integrating three key sensing modalities, e.g. RGB, infrared (IR), and event signals. The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames. Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework, designed specifically for UAV tracking applications and serving as a baseline for future research. Our framework incorporates two key technical innovations, e.g. an offset-guided adaptive alignment module to resolve spatio mismatches across sensors, and an adaptive dynamic fusion module to balance complementary information conveyed by different modalities. Furthermore, to overcome the limitations of conventional appearance modelling in multi-object tracking, we introduce an event-enhanced association mechanism that leverages motion cues from the event modality for more reliable identity maintenance. Comprehensive experiments demonstrate that the proposed framework consistently outperforms state-of-the-art methods. To foster further research in multi-modal UAV tracking, both the dataset and source code will be made publicly available at https://xuefeng-zhu5.github.io/MM-UAV/.

Problem

Research questions and friction points this paper is trying to address.

Addressing single-modality tracking failures in challenging UAV scenarios

Providing first multi-modal UAV tracking dataset with RGB/IR/event data

Developing adaptive fusion framework for robust multi-modal UAV tracking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal UAV tracking with RGB, IR, event signals

Offset-guided alignment module resolves sensor mismatches

Event-enhanced association uses motion cues for tracking

🔎 Similar Papers

UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection