EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

High-resolution remote sensing image semantic segmentation faces challenges including complex spatial layouts, multi-scale objects, and the difficulty of jointly preserving fine-grained local details and global contextual semantics. To address these, we propose the Region-Aware Proxy Network (RAPNet), the first Transformer-based framework operating at the region level: it constructs proxy nodes from semantically coherent regions and models long-range dependencies via region-context attention, while refining inter-class relationships across regions using a globally learned class-wise attention map. RAPNet integrates region-level feature aggregation, semantic mask guidance, and adaptive multi-source feature fusion—achieving a balanced trade-off between local structural fidelity and global semantic coherence without compromising computational efficiency. Extensive experiments demonstrate that RAPNet significantly outperforms state-of-the-art methods on three benchmark remote sensing datasets, with consistent improvements in boundary consistency and multi-class segmentation accuracy.

Technology Category

Application Category

📝 Abstract

High-resolution remote sensing (HRRS) image segmentation is challenging due to complex spatial layouts and diverse object appearances. While CNNs excel at capturing local features, they struggle with long-range dependencies, whereas Transformers can model global context but often neglect local details and are computationally expensive.We propose a novel approach, Region-Aware Proxy Network (RAPNet), which consists of two components: Contextual Region Attention (CRA) and Global Class Refinement (GCR). Unlike traditional methods that rely on grid-based layouts, RAPNet operates at the region level for more flexible segmentation. The CRA module uses a Transformer to capture region-level contextual dependencies, generating a Semantic Region Mask (SRM). The GCR module learns a global class attention map to refine multi-class information, combining the SRM and attention map for accurate segmentation.Experiments on three public datasets show that RAPNet outperforms state-of-the-art methods, achieving superior multi-class segmentation accuracy.

Problem

Research questions and friction points this paper is trying to address.

Addresses complex spatial layouts in HRRS image segmentation

Balances local features and global context for segmentation

Improves multi-class segmentation accuracy with region-level processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Region-level segmentation with Contextual Region Attention

Global Class Refinement for multi-class accuracy

Combines Transformer and region-aware proxy

🔎 Similar Papers

No similar papers found.