Bin~Wan,G2HFNet: GeoGran-Aware Hierarchical Feature Fusion Network for Salient Object Detection in Optical Remote Sensing Images

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of salient object detection in optical remote sensing imagery, where large scale variations and complex backgrounds hinder existing methods from effectively modeling geometric structures and fine-grained details, often yielding incomplete detection results. To overcome these limitations, the authors propose G2HFNet, a novel architecture built upon the Swin Transformer backbone that introduces, for the first time, a geometry-granularity-aware mechanism. The model incorporates a hierarchical feature fusion framework comprising four key components: Multi-scale Detail Enhancement (MDE), Dual-branch Geometry-Granularity Complementary (DGC), Deep Semantic Perception (DSP), and Local-Global Guided Fusion (LGF). Extensive experiments demonstrate that G2HFNet significantly outperforms state-of-the-art methods across multiple remote sensing datasets, particularly excelling in complex scenes by producing more complete and accurate saliency maps.

Technology Category

Application Category

📝 Abstract
Remote sensing images captured from aerial perspectives often exhibit significant scale variations and complex backgrounds, posing challenges for salient object detection (SOD). Existing methods typically extract multi-level features at a single scale using uniform attention mechanisms, leading to suboptimal representations and incomplete detection results. To address these issues, we propose a GeoGran-Aware Hierarchical Feature Fusion Network (G2HFNet) that fully exploits geometric and granular cues in optical remote sensing images. Specifically, G2HFNet adopts Swin Transformer as the backbone to extract multi-level features and integrates three key modules: the multi-scale detail enhancement (MDE) module to handle object scale variations and enrich fine details, the dual-branch geo-gran complementary (DGC) module to jointly capture fine-grained details and positional information in mid-level features, and the deep semantic perception (DSP) module to refine high-level positional cues via self-attention. Additionally, a local-global guidance fusion (LGF) module is introduced to replace traditional convolutions for effective multi-level feature integration. Extensive experiments demonstrate that G2HFNet achieves high-quality saliency maps and significantly improves detection performance in challenging remote sensing scenarios.
Problem

Research questions and friction points this paper is trying to address.

salient object detection
optical remote sensing images
scale variations
complex backgrounds
multi-level feature representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

GeoGran-Aware
Hierarchical Feature Fusion
Swin Transformer
Multi-scale Detail Enhancement
Local-Global Guidance Fusion
🔎 Similar Papers
No similar papers found.
B
Bin Wan
School of Control Science and Engineering, Shandong University, Jinan 250061, China; State Key Laboratory of Autonomous Intelligent Unmanned Systems, Shanghai 201210, China
R
Runmin Cong
School of Control Science and Engineering, Shandong University, Jinan 250061, China; State Key Laboratory of Autonomous Intelligent Unmanned Systems, Shanghai 201210, China
Xiaofei Zhou
Xiaofei Zhou
Shanghai Jiao Tong University
Human-Computer InteractionEducational TechnologyAI EducationAugmented RealityLearning
Hao Fang
Hao Fang
University of Edinburgh, School of Engineering
Deep LearningMedical ImagingInverse ProblemsElectrical Impedance TomographySoft Robotics
Chengtao Lv
Chengtao Lv
Nanyang Technological University
Efficient AI
Sam Kwong
Sam Kwong
Lingnan Univerity, Hong Kong
Video CodingEvolutionary ComputationMachine Learning and pattern recognition