From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection with Super Resolution

📅 2024-01-26
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the degraded detection performance in aerial imagery caused by small object sizes, high object density, severe blur, and occlusion, this paper proposes an end-to-end detection framework integrating super-resolution enhancement with a lightweight YOLOv5 architecture. Innovatively, a Transformer encoder is embedded into the YOLOv5 backbone to jointly model long-range dependencies and global contextual information alongside super-resolution preprocessing, effectively alleviating the information bottleneck for small objects. The framework is jointly trained and validated on multiple large-scale aerial benchmark datasets—including VisDrone-2023 and SeaDroneSee—achieving a mean Average Precision (mAP) of 52.5%, surpassing current state-of-the-art methods. Designed for efficiency, the model maintains high accuracy while enabling real-time inference, making it well-suited for edge deployment on resource-constrained UAV platforms.

Technology Category

Application Category

📝 Abstract
The demand for accurate object detection in aerial imagery has surged with the widespread use of drones and satellite technology. Traditional object detection models, trained on datasets biased towards large objects, struggle to perform optimally in aerial scenarios where small, densely clustered objects are prevalent. To address this challenge, we present an innovative approach that combines super-resolution and an adapted lightweight YOLOv5 architecture. We employ a range of datasets, including VisDrone-2023, SeaDroneSee, VEDAI, and NWPU VHR-10, to evaluate our model's performance. Our Super Resolved YOLOv5 architecture features Transformer encoder blocks, allowing the model to capture global context and context information, leading to improved detection results, especially in high-density, occluded conditions. This lightweight model not only delivers improved accuracy but also ensures efficient resource utilization, making it well-suited for real-time applications. Our experimental results demonstrate the model's superior performance in detecting small and densely clustered objects, underlining the significance of dataset choice and architectural adaptation for this specific task. In particular, the method achieves 52.5% mAP on VisDrone, exceeding top prior works. This approach promises to significantly advance object detection in aerial imagery, contributing to more accurate and reliable results in a variety of real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Detect small, dense aerial objects with limited pixels
Improve image quality degraded by distance and motion blur
Enhance YOLOv5 for efficient aerial object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework with super-resolution and YOLOv5
Aerial-optimized SRGAN fine-tuning for image recovery
Efficient Attention Module and CLFPN architectural enhancements
🔎 Similar Papers
No similar papers found.