UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction

📅 2025-12-14

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Remote sensing imagery exhibits high variability in building structures, fragmented feature pyramids, and insufficient global-local feature fusion—leading to ambiguous segmentation boundaries and degraded accuracy. To address these challenges, we propose an uncertainty-guided end-to-end building extraction framework. Our key contributions are: (1) a hybrid CNN-Transformer encoder that jointly captures local details and long-range dependencies; (2) a Cross-level Interaction Block (CIB) enabling bidirectional compensation across pyramid levels; (3) a Global-Local Fusion (GLF) module to enhance semantic consistency; and (4) an Uncertainty-Aware Decoder (UAD) that models pixel-wise prediction confidence to guide optimization. Evaluated on multiple benchmark remote sensing datasets, our method achieves state-of-the-art performance, significantly improving boundary sharpness and recall for small-scale buildings. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Building extraction from remote sensing images is a challenging task due to the complex structure variations of the buildings. Existing methods employ convolutional or self-attention blocks to capture the multi-scale features in the segmentation models, while the inherent gap of the feature pyramids and insufficient global-local feature integration leads to inaccurate, ambiguous extraction results. To address this issue, in this paper, we present an Uncertainty-Aggregated Global-Local Fusion Network (UAGLNet), which is capable to exploit high-quality global-local visual semantics under the guidance of uncertainty modeling. Specifically, we propose a novel cooperative encoder, which adopts hybrid CNN and transformer layers at different stages to capture the local and global visual semantics, respectively. An intermediate cooperative interaction block (CIB) is designed to narrow the gap between the local and global features when the network becomes deeper. Afterwards, we propose a Global-Local Fusion (GLF) module to complementarily fuse the global and local representations. Moreover, to mitigate the segmentation ambiguity in uncertain regions, we propose an Uncertainty-Aggregated Decoder (UAD) to explicitly estimate the pixel-wise uncertainty to enhance the segmentation accuracy. Extensive experiments demonstrate that our method achieves superior performance to other state-of-the-art methods. Our code is available at https://github.com/Dstate/UAGLNet

Problem

Research questions and friction points this paper is trying to address.

Extracts buildings from complex remote sensing images

Integrates global-local features to reduce segmentation ambiguity

Models pixel-wise uncertainty to improve extraction accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid CNN-Transformer encoder captures local and global semantics

Global-Local Fusion module integrates complementary representations

Uncertainty-Aggregated Decoder enhances accuracy via pixel-wise uncertainty

🔎 Similar Papers

No similar papers found.