๐ค AI Summary
Remote sensing image semantic segmentation faces challenges including complex background interference, multi-scale and multi-orientation variations, and large intra-class diversity, limiting the performance of existing methods. To address these issues, we propose the Local-Global Class-Aware Network (LGCP-Net). First, we design a Global Class-Aware (GCA) module to model class-level contextual dependencies and suppress background noise. Second, we introduce an affine-transformation-driven Local Class-Aware (LCA) moduleโthe first of its kindโto adaptively align multi-scale and multi-orientation features, bridging the gap between pixel-level representations and class-level semantics. Third, we construct a global-local collaborative class-aware architecture that jointly optimizes class-level semantic consistency and pixel-level discriminability. LGCP-Net achieves state-of-the-art performance on three major remote sensing benchmarks, significantly outperforming both general-purpose and domain-specific methods, while attaining a superior trade-off between accuracy and inference efficiency. The source code is publicly available.
๐ Abstract
Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensing images, which is made up of a Global Class Awareness (GCA) module and several Local Class Awareness (LCA) modules. The GCA module captures global representations for class-level context modeling to reduce the interference of background noise. The LCA module generates local class representations as intermediate perceptual elements to indirectly associate pixels with the global class representations, targeting at dealing with the large intra-class variance problem. In particular, we introduce affine transformations in the LCA module for adaptive extraction of local class representations to effectively tolerate scale and orientation variations in remotely sensed images. Extensive experiments on three benchmark datasets show that our LOGCAN++ outperforms current mainstream general and remote sensing semantic segmentation methods and achieves a better trade-off between speed and accuracy. Code is available at https://github.com/xwmaxwma/rssegmentation.