🤖 AI Summary
To address low detection accuracy in complex scenes where small unmanned aerial vehicles (UAVs) are easily confused with birds, this paper proposes a robust detection method based on YOLOv11. The approach introduces three key innovations: (1) a novel multi-scale prediction aggregation mechanism that jointly leverages patch-wise and full-image features to enhance localization accuracy for small objects; (2) a semantic-aware Copy-Paste data augmentation strategy that improves inter-class discriminability by preserving contextual semantics during instance synthesis; and (3) a trajectory-consistency-driven post-processing module that compensates for missed detections across frames, thereby strengthening temporal robustness. Evaluated on the 8th WOSDETC UAV vs. Bird Detection Challenge at IJCNN 2025, the method ranks among the top three globally, achieving significant improvements in mean Average Precision (mAP) and recall over baseline models—particularly excelling in small-object detection tasks.
📝 Abstract
Detecting small drones, often indistinguishable from birds, is crucial for modern surveillance. This work introduces a drone detection methodology built upon the medium-sized YOLOv11 object detection model. To enhance its performance on small targets, we implemented a multi-scale approach in which the input image is processed both as a whole and in segmented parts, with subsequent prediction aggregation. We also utilized a copy-paste data augmentation technique to enrich the training dataset with diverse drone and bird examples. Finally, we implemented a post-processing technique that leverages frame-to-frame consistency to mitigate missed detections. The proposed approach attained a top-3 ranking in the 8th WOSDETC Drone-vsBird Detection Grand Challenge, held at the 2025 International Joint Conference on Neural Networks (IJCNN), showcasing its capability to detect drones in complex environments effectively.