🤖 AI Summary
This work addresses the challenge of detecting small unmanned aerial vehicles (UAVs) using an airborne event camera in dynamic scenarios where both the observer platform and the target UAV are in motion, which introduces severe background clutter due to self-motion and sparse target-related events. To this end, the authors introduce M²E-UAV, the first benchmark specifically designed for such “motion-to-motion” event-based UAV detection, and propose M²E-Point, a point-set modeling approach that takes raw [x, y, t, p] event streams as input. The method employs EdgeConv to capture local spatiotemporal structures, followed by DBSCAN clustering to generate detection bounding boxes, and further investigates a conditional fusion strategy incorporating IMU pose information. Experiments demonstrate that M²E-Point achieves 0.9673 F1-score and 0.5501 mAP50-95 on the validation set, with the IMU-enhanced variant yielding a marginal improvement to 0.5561 mAP50-95, thereby validating the efficacy of point-based modeling and the limited yet measurable benefit of IMU assistance.
📝 Abstract
Tiny UAV detection from an onboard event camera is difficult when the observer and target move at the same time. In this motion-on-motion regime, ego-motion activates background edges across buildings, vegetation, and horizon structures, while the UAV may appear as a sparse event cluster. To explore this practical problem, we present M$^2$E-UAV, a benchmark and analysis setup for onboard motion-on-motion event-based tiny UAV detection. The processed M$^2$E-UAV benchmark contains 87,223 training samples and 21,395 validation samples across four scene families: sunny building-forest, sunny farm-village, sunset building-forest, and sunset farm-village. We provide M$^2$E-Point, a point-based event baseline, and M$^2$E-Point + IMU, an IMU-conditioned variant, to analyze the role of inertial cues under onboard motion-on-motion detection. M$^2$E-Point encodes events as $[x,y,t,p]$ point sets, extracts local event structure with EdgeConv, and predicts event-level UAV foreground scores, from which bounding boxes are derived via DBSCAN. Our validation-stage analysis shows that point-based event modeling is a strong baseline, while simple IMU conditioning provides only marginal aggregate gains. Under the train/validation split, M$^2$E-Point achieves 0.9673 F1 and 0.5501 mAP50-95, while the IMU-conditioned variant reaches 0.5561 mAP50-95 with only marginal aggregate changes, serving as an initial baseline for future exploration in this domain. Code will be ready in https://github.com/Wickyan/M2E-UAV.