EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-resolution remote sensing image semantic segmentation faces challenges including complex spatial layouts, multi-scale objects, and the difficulty of jointly preserving fine-grained local details and global contextual semantics. To address these, we propose the Region-Aware Proxy Network (RAPNet), the first Transformer-based framework operating at the region level: it constructs proxy nodes from semantically coherent regions and models long-range dependencies via region-context attention, while refining inter-class relationships across regions using a globally learned class-wise attention map. RAPNet integrates region-level feature aggregation, semantic mask guidance, and adaptive multi-source feature fusion—achieving a balanced trade-off between local structural fidelity and global semantic coherence without compromising computational efficiency. Extensive experiments demonstrate that RAPNet significantly outperforms state-of-the-art methods on three benchmark remote sensing datasets, with consistent improvements in boundary consistency and multi-class segmentation accuracy.

Technology Category

Application Category

📝 Abstract
High-resolution remote sensing (HRRS) image segmentation is challenging due to complex spatial layouts and diverse object appearances. While CNNs excel at capturing local features, they struggle with long-range dependencies, whereas Transformers can model global context but often neglect local details and are computationally expensive.We propose a novel approach, Region-Aware Proxy Network (RAPNet), which consists of two components: Contextual Region Attention (CRA) and Global Class Refinement (GCR). Unlike traditional methods that rely on grid-based layouts, RAPNet operates at the region level for more flexible segmentation. The CRA module uses a Transformer to capture region-level contextual dependencies, generating a Semantic Region Mask (SRM). The GCR module learns a global class attention map to refine multi-class information, combining the SRM and attention map for accurate segmentation.Experiments on three public datasets show that RAPNet outperforms state-of-the-art methods, achieving superior multi-class segmentation accuracy.
Problem

Research questions and friction points this paper is trying to address.

Addresses complex spatial layouts in HRRS image segmentation
Balances local features and global context for segmentation
Improves multi-class segmentation accuracy with region-level processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Region-level segmentation with Contextual Region Attention
Global Class Refinement for multi-class accuracy
Combines Transformer and region-aware proxy
🔎 Similar Papers
No similar papers found.
Y
Yichun Yu
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Yuqing Lan
Yuqing Lan
National University of Defense Technology
3D VisionComputer Graphics
Z
Zhihuan Xing
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
X
Xiaoyi Yang
School of Software, Beihang University, Beijing 100191, China
T
Tingyue Tang
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
D
Dan Yu
China Standard Intelligent Security, Beijing 100097, China