🤖 AI Summary
This work addresses the challenges of rural road extraction from high-resolution remote sensing imagery, where diverse surface materials, vegetation occlusion, and narrow road widths significantly hinder performance—particularly for urban-oriented methods. To overcome these limitations, the authors propose DSFC-Net, a dual-encoder architecture that synergistically integrates spatial and frequency-domain information. A CNN branch captures fine local boundary details, while a Spatial-Frequency Hybrid Transformer models global topological structures to handle occlusions. Key innovations include a Cross-Frequency Interaction Attention module that explicitly decouples high- and low-frequency components via a Laplacian pyramid, and a Channel Feature Fusion Module for adaptive integration of features from both branches. Experiments demonstrate that DSFC-Net substantially outperforms existing approaches on the WHU-RuR+, DeepGlobe, and Massachusetts datasets, achieving superior connectivity and accuracy in extracting narrow rural roads.
📝 Abstract
Accurate extraction of rural roads from high-resolution remote sensing imagery is essential for infrastructure planning and sustainable development. However, this task presents unique challenges in rural settings due to several factors. These include high intra-class variability and low inter-class separability from diverse surface materials, frequent vegetation occlusions that disrupt spatial continuity, and narrow road widths that exacerbate detection difficulties. Existing methods, primarily optimized for structured urban environments, often underperform in these scenarios as they overlook such distinctive characteristics. To address these challenges, we propose DSFC-Net, a dual-encoder framework that synergistically fuses spatial and frequency-domain information. Specifically, a CNN branch is employed to capture fine-grained local road boundaries and short-range continuity, while a novel Spatial-Frequency Hybrid Transformer (SFT) is introduced to robustly model global topological dependencies against vegetation occlusions. Distinct from standard attention mechanisms that suffer from frequency bias, the SFT incorporates a Cross-Frequency Interaction Attention (CFIA) module that explicitly decouples high- and low-frequency information via a Laplacian Pyramid strategy. This design enables the dynamic interaction between spatial details and frequency-aware global contexts, effectively preserving the connectivity of narrow roads. Furthermore, a Channel Feature Fusion Module (CFFM) is proposed to bridge the two branches by adaptively recalibrating channel-wise feature responses, seamlessly integrating local textures with global semantics for accurate segmentation. Comprehensive experiments on the WHU-RuR+, DeepGlobe, and Massachusetts datasets validate the superiority of DSFC-Net over state-of-the-art approaches.