DB SwinT: A Dual-Branch Swin Transformer Network for Road Extraction in Optical Remote Sensing Imagery

πŸ“… 2026-03-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of fragmented road structures and low extraction accuracy in optical remote sensing imagery caused by occlusions from trees, buildings, and other objects. To this end, the authors propose a dual-branch Swin Transformer network that integrates a U-Net–inspired multi-scale feature fusion strategy. The architecture employs separate local and global branches to recover fine details in occluded regions and preserve topological continuity of road networks, respectively. An Attention-based Feature Fusion (AFF) module is further introduced to adaptively integrate information from both branches. This design effectively balances local detail reconstruction with global semantic context modeling. Experimental results demonstrate state-of-the-art performance, achieving Intersection over Union (IoU) scores of 79.35% and 74.84% on the Massachusetts and DeepGlobe road datasets, respectively, significantly outperforming existing methods.

Technology Category

Application Category

πŸ“ Abstract
With the continuous improvement in the spatial resolution of optical remote sensing imagery, accurate road extraction has become increasingly important for applications such as urban planning, traffic monitoring, and disaster management. However, road extraction in complex urban and rural environments remains challenging, as roads are often occluded by trees, buildings, and other objects, leading to fragmented structures and reduced extraction accuracy. To address this problem, this paper proposes a Dual-Branch Swin Transformer network (DB SwinT) for road extraction. The proposed framework combines the long-range dependency modeling capability of the Swin Transformer with the multi-scale feature fusion strategy of U-Net, and employs a dual-branch encoder to learn complementary local and global representations. Specifically, the local branch focuses on recovering fine structural details in occluded areas, while the global branch captures broader semantic context to preserve the overall continuity of road networks. In addition, an Attentional Feature Fusion (AFF) module is introduced to adaptively fuse features from the two branches, further enhancing the representation of occluded road segments. Experimental results on the Massachusetts and DeepGlobe datasets show that DB SwinT achieves Intersection over Union (IoU) scores of 79.35\% and 74.84\%, respectively, demonstrating its effectiveness for road extraction from optical remote sensing imagery.
Problem

Research questions and friction points this paper is trying to address.

road extraction
optical remote sensing imagery
occlusion
fragmented structures
urban and rural environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Branch Architecture
Swin Transformer
Attentional Feature Fusion
Road Extraction
Remote Sensing Imagery
πŸ”Ž Similar Papers
No similar papers found.
Z
Zongyang He
School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, China
X
Xiangli Yang
School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, China
Xian Gao
Xian Gao
Shanghai Jiao Tong University
LLMMulti-modalAI for Education
Z
Zhiguo Wang
College of Information Engineering, Inner Mongolia University of Technology, Inner Mongolia Key Laboratory of Radar Technology and Application, Hohhot 010051, China