Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

To address weak feature representation and inefficient cross-scale fusion in high-resolution satellite image object detection, this paper proposes GLOD: a novel architecture that replaces the conventional CNN backbone with Swin Transformer to enhance long-range dependency modeling; introduces an UpConvMixer upsampling module and a multi-scale Fusion Block for efficient feature reconstruction and fusion; innovatively adopts an asymmetric cross-layer fusion strategy incorporating CBAM attention mechanisms; and employs a multi-path detection head to strengthen multi-scale object representation. GLOD significantly improves the exploitation of spatial priors inherent in satellite imagery. Evaluated on the xView dataset, it achieves 32.95% mAP—surpassing the previous state-of-the-art by 11.46%—while attaining a superior trade-off between detection accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract

We present GLOD, a transformer-first architecture for object detection in high-resolution satellite imagery. GLOD replaces CNN backbones with a Swin Transformer for end-to-end feature extraction, combined with novel UpConvMixer blocks for robust upsampling and Fusion Blocks for multi-scale feature integration. Our approach achieves 32.95% on xView, outperforming SOTA methods by 11.46%. Key innovations include asymmetric fusion with CBAM attention and a multi-path head design capturing objects across scales. The architecture is optimized for satellite imagery challenges, leveraging spatial priors while maintaining computational efficiency.

Problem

Research questions and friction points this paper is trying to address.

Improving object detection in high-resolution satellite imagery

Combining Transformers and CNNs for efficient feature extraction

Addressing multi-scale object detection with novel fusion blocks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Swin Transformer replaces CNN backbones

UpConvMixer blocks for robust upsampling

Asymmetric fusion with CBAM attention

🔎 Similar Papers

No similar papers found.