DB SwinT: A Dual-Branch Swin Transformer Network for Road Extraction in Optical Remote Sensing Imagery

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge of fragmented road structures and low extraction accuracy in optical remote sensing imagery caused by occlusions from trees, buildings, and other objects. To this end, the authors propose a dual-branch Swin Transformer network that integrates a U-Net–inspired multi-scale feature fusion strategy. The architecture employs separate local and global branches to recover fine details in occluded regions and preserve topological continuity of road networks, respectively. An Attention-based Feature Fusion (AFF) module is further introduced to adaptively integrate information from both branches. This design effectively balances local detail reconstruction with global semantic context modeling. Experimental results demonstrate state-of-the-art performance, achieving Intersection over Union (IoU) scores of 79.35% and 74.84% on the Massachusetts and DeepGlobe road datasets, respectively, significantly outperforming existing methods.

Technology Category

Application Category

📝 Abstract

With the continuous improvement in the spatial resolution of optical remote sensing imagery, accurate road extraction has become increasingly important for applications such as urban planning, traffic monitoring, and disaster management. However, road extraction in complex urban and rural environments remains challenging, as roads are often occluded by trees, buildings, and other objects, leading to fragmented structures and reduced extraction accuracy. To address this problem, this paper proposes a Dual-Branch Swin Transformer network (DB SwinT) for road extraction. The proposed framework combines the long-range dependency modeling capability of the Swin Transformer with the multi-scale feature fusion strategy of U-Net, and employs a dual-branch encoder to learn complementary local and global representations. Specifically, the local branch focuses on recovering fine structural details in occluded areas, while the global branch captures broader semantic context to preserve the overall continuity of road networks. In addition, an Attentional Feature Fusion (AFF) module is introduced to adaptively fuse features from the two branches, further enhancing the representation of occluded road segments. Experimental results on the Massachusetts and DeepGlobe datasets show that DB SwinT achieves Intersection over Union (IoU) scores of 79.35\% and 74.84\%, respectively, demonstrating its effectiveness for road extraction from optical remote sensing imagery.

Problem

Research questions and friction points this paper is trying to address.

road extraction

optical remote sensing imagery

occlusion

fragmented structures

urban and rural environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Branch Architecture

Swin Transformer

Attentional Feature Fusion