EFSI-DETR: Efficient Frequency-Semantic Integration for Real-Time Small Object Detection in UAV Imagery

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of small object detection in drone imagery, which are primarily hindered by insufficient feature representation and inefficient multi-scale fusion. To overcome these limitations, we propose the EFSI-DETR framework, which innovatively integrates a Dynamic Frequency-Spatial Collaborative Network (DyFusNet), an Efficient Semantic Feature Condensation module (ESFC), and a Fine-Grained Feature Retention strategy (FFR). This design enables effective collaboration between frequency-domain and semantic information while balancing detection accuracy and inference efficiency. Evaluated on the VisDrone and CODrone datasets, our method achieves state-of-the-art performance, improving overall AP by 1.6% and significantly boosting AP_s for small objects by 5.8%. Moreover, it attains real-time inference at 188 FPS on a single RTX 4090 GPU.

Technology Category

Application Category

📝 Abstract
Real-time small object detection in Unmanned Aerial Vehicle (UAV) imagery remains challenging due to limited feature representation and ineffective multi-scale fusion. Existing methods underutilize frequency information and rely on static convolutional operations, which constrain the capacity to obtain rich feature representations and hinder the effective exploitation of deep semantic features. To address these issues, we propose EFSI-DETR, a novel detection framework that integrates efficient semantic feature enhancement with dynamic frequency-spatial guidance. EFSI-DETR comprises two main components: (1) a Dynamic Frequency-Spatial Unified Synergy Network (DyFusNet) that jointly exploits frequency and spatial cues for robust multi-scale feature fusion, (2) an Efficient Semantic Feature Concentrator (ESFC) that enables deep semantic extraction with minimal computational cost. Furthermore, a Fine-grained Feature Retention (FFR) strategy is adopted to incorporate spatially rich shallow features during fusion to preserve fine-grained details, crucial for small object detection in UAV imagery. Extensive experiments on VisDrone and CODrone benchmarks demonstrate that our EFSI-DETR achieves the state-of-the-art performance with real-time efficiency, yielding improvement of \textbf{1.6}\% and \textbf{5.8}\% in AP and AP$_{s}$ on VisDrone, while obtaining \textbf{188} FPS inference speed on a single RTX 4090 GPU.
Problem

Research questions and friction points this paper is trying to address.

small object detection
UAV imagery
real-time detection
feature representation
multi-scale fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Frequency-Spatial Fusion
Efficient Semantic Feature Extraction
Fine-grained Feature Retention
Real-Time Small Object Detection
UAV Imagery
🔎 Similar Papers
No similar papers found.
Y
Yu Xia
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
C
Chang Liu
School of Computer Science, Wuhan University, China
Tianqi Xiang
Tianqi Xiang
The Hong Kong University of Science and Technology
CV
Z
Zhigang Tu
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China; Wuhan University Shenzhen Research Institute, Shenzhen, China