🤖 AI Summary
This work addresses the challenges of detecting small-scale drone targets from ground-level views, where low pixel occupancy, complex backgrounds, and stringent real-time requirements pose significant difficulties. To this end, we propose a lightweight yet high-resolution detection framework built upon the YOLOv26 architecture. Our approach incorporates a P2 high-resolution detection head, adopts DFL-free and NMS-free designs, and integrates ProgLoss with the STAL training mechanism. Furthermore, a MuSGD hybrid optimization strategy is employed to mitigate gradient oscillations caused by sparse small objects. Evaluated on our newly curated DroneSOD-30K dataset, the proposed method achieves an mAP@0.5 of 86.0%, outperforming YOLOv5n by 7.8 percentage points, while maintaining high inference speeds of 226 FPS on an RTX 5090 GPU and 35 FPS on CPU, thus balancing accuracy and edge-deployment efficiency.
📝 Abstract
Detecting small unmanned aerial vehicles (UAVs) from a ground-to-air (G2A) perspective presents significant challenges, including extremely low pixel occupancy, cluttered aerial backgrounds, and strict real-time constraints. Existing YOLO-based detectors are primarily optimized for general object detection and often lack adequate feature resolution for sub-pixel targets, while introducing complexities during deployment. In this paper, we propose SDD-YOLO, a small-target detection framework tailored for G2A anti-UAV surveillance. To capture fine-grained spatial details critical for micro-targets, SDD-YOLO introduces a P2 high-resolution detection head operating at 4 times downsampling. Furthermore, we integrate the recent architectural advancements from YOLO26, including a DFL-free, NMS-free architecture for streamlined inference, and the MuSGD hybrid training strategy with ProgLoss and STAL, which substantially mitigates gradient oscillation on sparse small-target signals. To support our evaluation, we construct DroneSOD-30K, a large-scale G2A dataset comprising approximately 30,000 annotated images covering diverse meteorological conditions. Experiments demonstrate that SDD-YOLO-n achieves a mAP@0.5 of 86.0% on DroneSOD-30K, surpassing the YOLOv5n baseline by 7.8 percentage points. Extensive inference analysis shows our model attains 226 FPS on an NVIDIA RTX 5090 and 35 FPS on an Intel Xeon CPU, demonstrating exceptional efficiency for future edge deployment.