SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address accuracy degradation in fine-resolution remote sensing (FRRS) image semantic segmentation caused by class imbalance, object occlusion, and scale variation, this paper proposes a stacked dual-encoder–decoder architecture embedded with atrous residual modules to achieve deep coupling of global semantics and local details. The method incorporates a deep residual backbone, multi-scale atrous convolutions, and a two-stage feature fusion mechanism, significantly enhancing discriminative capability for small objects and occluded regions. Evaluated on the ISPRS Vaihingen and Potsdam benchmarks, it achieves state-of-the-art (SOTA) performance, with improved boundary localization accuracy and notably higher small-object recall compared to mainstream deep convolutional neural networks (DCNNs). This work is the first to systematically integrate long-range contextual modeling with fine-grained spatial detail representation in remote sensing segmentation.

Technology Category

Application Category

📝 Abstract

Land cover maps generated from semantic segmentation of high-resolution remotely sensed images have drawn mucon in the photogrammetry and remote sensing research community. Currently, massive fine-resolution remotely sensed (FRRS) images acquired by improving sensing and imaging technologies become available. However, accurate semantic segmentation of such FRRS images is greatly affected by substantial class disparities, the invisibility of key ground objects due to occlusion, and object size variation. Despite the extraordinary potential in deep convolutional neural networks (DCNNs) in image feature learning and representation, extracting sufficient features from FRRS images for accurate semantic segmentation is still challenging. These challenges demand the deep learning models to learn robust features and generate sufficient feature descriptors. Specifically, learning multi-contextual features to guarantee adequate coverage of varied object sizes from the ground scene and harnessing global-local contexts to overcome class disparities challenge even profound networks. Deeper networks significantly lose spatial details due to gradual downsampling processes resulting in poor segmentation results and coarse boundaries. This article presents a stacked deep residual network (SDRNet) for semantic segmentation from FRRS images. The proposed framework utilizes two stacked encoder-decoder networks to harness long-range semantics yet preserve spatial information and dilated residual blocks (DRB) between each encoder and decoder network to capture sufficient global dependencies thus improving segmentation performance. Our experimental results obtained using the ISPRS Vaihingen and Potsdam datasets demonstrate that the SDRNet performs effectively and competitively against current DCNNs in semantic segmentation.

Problem

Research questions and friction points this paper is trying to address.

Accurate semantic segmentation of fine-resolution remotely sensed images despite class disparities and occlusion.

Learning multi-contextual features to handle varied object sizes in ground scenes.

Preserving spatial details in deep networks to avoid coarse segmentation boundaries.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stacked encoder-decoder networks for long-range semantics

Dilated residual blocks for global dependencies

Preserves spatial information in deep networks

🔎 Similar Papers

No similar papers found.