🤖 AI Summary
To address the low detection accuracy of small objects in real-time aerial imagery and the inherent trade-off between speed and accuracy, this paper proposes FBRT-YOLO—a lightweight, real-time detector tailored for deployment on embedded aerial platforms. Methodologically, we introduce the Feature Complementary Mapping (FCM) module—novelly designed to enhance spatial localization of small objects—and the Multi-Kernel Perception (MKP) unit to enable efficient multi-scale object perception. Our architecture integrates spatial-semantic alignment, parallel multi-scale convolutional kernel processing, and lightweight feature fusion. Evaluated on VisDrone, UAVDT, and AI-TOD benchmarks, FBRT-YOLO consistently outperforms state-of-the-art real-time detectors: it achieves a 2.1–4.7 percentage point improvement in average precision (AP) while maintaining over 30 FPS inference speed, thereby achieving synergistic optimization of accuracy and efficiency.
📝 Abstract
Embedded flight devices with visual capabilities have become essential for a wide range of applications.
In aerial image detection, while many existing methods have partially addressed the issue of small target detection, challenges remain in optimizing small target detection and balancing detection accuracy with efficiency.
These issues are key obstacles to the advancement of real-time aerial image detection.
In this paper, we propose a new family of real-time detectors for aerial image detection, named FBRT-YOLO, to address the imbalance between detection accuracy and efficiency. Our method comprises two lightweight modules: Feature Complementary Mapping Module (FCM) and Multi-Kernel Perception Unit (MKP), designed to enhance object perception for small targets in aerial images.
FCM focuses on alleviating the problem of information imbalance caused by the loss of small target information in deep networks. It aims to integrate spatial positional information of targets more deeply into the network, better aligning with semantic information in the deeper layers to improve the localization of small targets.
We introduce MKP, which leverages convolutions with kernels of different sizes to enhance the relationships between targets of various scales and improve the perception of targets at different scales.
Extensive experimental results on three major aerial image datasets, including Visdrone, UAVDT, and AI-TOD, demonstrate that FBRT-YOLO outperforms various real-time detectors in terms of performance and speed.