FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address performance bottlenecks in detecting tiny objects in high-resolution aerial imagery—stemming from weak global contextual awareness in shallow features and loss of multi-scale details—this paper proposes a frequency-decoupled, multi-domain collaborative detection framework. Methodologically, it introduces (1) the Wavelet Kolmogorov-Arnold Transformer (WKAT), a novel backbone integrating wavelet-based multi-scale decomposition with Kolmogorov-Arnold nonlinear representation learning; and (2) a cross-stage partial fusion module coupled with a unified spatial-frequency-structural coordination mechanism, enabling dynamic balance between low-frequency semantic enhancement and high-frequency detail preservation. Evaluated on the VisDrone dataset, the method achieves state-of-the-art performance under parameter-constrained settings: +6.5% AP and +8.2% AP₅₀, while employing fewer parameters than competing approaches.

Technology Category

Application Category

📝 Abstract

Aerial-view object detection is a critical technology for real-world applications such as natural resource monitoring, traffic management, and UAV-based search and rescue. Detecting tiny objects in high-resolution aerial imagery presents a long-standing challenge due to their limited visual cues and the difficulty of modeling global context in complex scenes. Existing methods are often hampered by delayed contextual fusion and inadequate non-linear modeling, failing to effectively use global information to refine shallow features and thus encountering a performance bottleneck. To address these challenges, we propose FMC-DETR, a novel framework with frequency-decoupled fusion for aerial-view object detection. First, we introduce the Wavelet Kolmogorov-Arnold Transformer (WeKat) backbone, which applies cascaded wavelet transforms to enhance global low-frequency context perception in shallow features while preserving fine-grained details, and employs Kolmogorov-Arnold networks to achieve adaptive non-linear modeling of multi-scale dependencies. Next, a lightweight Cross-stage Partial Fusion (CPF) module reduces redundancy and improves multi-scale feature interaction. Finally, we introduce the Multi-Domain Feature Coordination (MDFC) module, which unifies spatial, frequency, and structural priors to to balance detail preservation and global enhancement. Extensive experiments on benchmark aerial-view datasets demonstrate that FMC-DETR achieves state-of-the-art performance with fewer parameters. On the challenging VisDrone dataset, our model achieves improvements of 6.5% AP and 8.2% AP50 over the baseline, highlighting its effectiveness in tiny object detection. The code can be accessed at https://github.com/bloomingvision/FMC-DETR.

Problem

Research questions and friction points this paper is trying to address.

Detecting tiny objects in high-resolution aerial imagery

Addressing delayed contextual fusion in object detection

Improving global information usage for shallow feature refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet Kolmogorov-Arnold Transformer enhances global context perception

Cross-stage Partial Fusion module improves multi-scale feature interaction

Multi-Domain Feature Coordination unifies spatial, frequency, and structural priors

🔎 Similar Papers

No similar papers found.