🤖 AI Summary
High-resolution remote sensing image semantic segmentation faces challenges including complex spatial layouts, multi-scale objects, and the difficulty of jointly preserving fine-grained local details and global contextual semantics. To address these, we propose the Region-Aware Proxy Network (RAPNet), the first Transformer-based framework operating at the region level: it constructs proxy nodes from semantically coherent regions and models long-range dependencies via region-context attention, while refining inter-class relationships across regions using a globally learned class-wise attention map. RAPNet integrates region-level feature aggregation, semantic mask guidance, and adaptive multi-source feature fusion—achieving a balanced trade-off between local structural fidelity and global semantic coherence without compromising computational efficiency. Extensive experiments demonstrate that RAPNet significantly outperforms state-of-the-art methods on three benchmark remote sensing datasets, with consistent improvements in boundary consistency and multi-class segmentation accuracy.
📝 Abstract
High-resolution remote sensing (HRRS) image segmentation is challenging due to complex spatial layouts and diverse object appearances. While CNNs excel at capturing local features, they struggle with long-range dependencies, whereas Transformers can model global context but often neglect local details and are computationally expensive.We propose a novel approach, Region-Aware Proxy Network (RAPNet), which consists of two components: Contextual Region Attention (CRA) and Global Class Refinement (GCR). Unlike traditional methods that rely on grid-based layouts, RAPNet operates at the region level for more flexible segmentation. The CRA module uses a Transformer to capture region-level contextual dependencies, generating a Semantic Region Mask (SRM). The GCR module learns a global class attention map to refine multi-class information, combining the SRM and attention map for accurate segmentation.Experiments on three public datasets show that RAPNet outperforms state-of-the-art methods, achieving superior multi-class segmentation accuracy.