🤖 AI Summary
This work addresses the challenges in salient object detection for remote sensing imagery, including large-scale variations of targets, difficulties in modeling global context, and high computational costs of self-attention mechanisms. To tackle these issues, the authors propose RDNet, a novel architecture built upon the Swin Transformer backbone and enhanced with three key innovations: Dynamic Adaptive Detail Awareness, Frequency-Domain Matching Context Enhancement—which integrates wavelet transforms with cross-attention—and Region Aspect-Ratio Aware Localization, which employs aspect-ratio-guided dynamic convolution kernels. This integrated approach significantly improves detection robustness and localization accuracy across multi-scale objects, achieving state-of-the-art performance on remote sensing salient object detection benchmarks.
📝 Abstract
Salient object detection (SOD) in remote sensing images faces significant challenges due to large variations in object sizes, the computational cost of self-attention mechanisms, and the limitations of CNN-based extractors in capturing global context and long-range dependencies. Existing methods that rely on fixed convolution kernels often struggle to adapt to diverse object scales, leading to detail loss or irrelevant feature aggregation. To address these issues, this work aims to enhance robustness to scale variations and achieve precise object localization. We propose the Region Proportion-Aware Dynamic Adaptive Salient Object Detection Network (RDNet), which replaces the CNN backbone with the SwinTransformer for global context modeling and introduces three key modules: (1) the Dynamic Adaptive Detail-aware (DAD) module, which applies varied convolution kernels guided by object region proportions; (2) the Frequency-matching Context Enhancement (FCE) module, which enriches contextual information through wavelet interactions and attention; and (3) the Region Proportion-aware Localization (RPL) module, which employs cross-attention to highlight semantic details and integrates a Proportion Guidance (PG) block to assist the DAD module. By combining these modules, RDNet achieves robustness against scale variations and accurate localization, delivering superior detection performance compared with state-of-the-art methods.