🤖 AI Summary
High-resolution remote sensing image semantic segmentation suffers from ambiguous boundaries and class confusion due to high inter-class similarity and large intra-class variability. To address this, we propose a bidirectional collaborative refinement framework. Its core contributions are: (1) a heatmap-driven bidirectional information synergy (HBIS) module that establishes mutual heatmap-based mapping between feature maps and class embeddings; (2) an interpretable multi-scale heatmap hierarchical supervision strategy to guide feature learning at multiple granularities; and (3) a cross-layer class embedding Fisher discriminative loss that enhances the separability of embeddings in the latent space. Extensive experiments demonstrate state-of-the-art performance on three benchmark datasets—LoveDA, Vaihingen, and Potsdam—achieving significant improvements in both segmentation accuracy and model interpretability. The proposed framework establishes a robust, transparent, and semantically grounded segmentation paradigm for complex remote sensing scenes.
📝 Abstract
High-resolution remote sensing image semantic segmentation (HRSS) is a fundamental yet critical task in the field of Earth observation. However, it has long faced the challenges of high inter-class similarity and large intra-class variability. Existing approaches often struggle to effectively inject abstract yet strongly discriminative semantic knowledge into pixel-level feature learning, leading to blurred boundaries and class confusion in complex scenes. To address these challenges, we propose Bidirectional Co-Refinement Framework for HRSS (BiCoR-Seg). Specifically, we design a Heatmap-driven Bidirectional Information Synergy Module (HBIS), which establishes a bidirectional information flow between feature maps and class embeddings by generating class-level heatmaps. Based on HBIS, we further introduce a hierarchical supervision strategy, where the interpretable heatmaps generated by each HBIS module are directly utilized as low-resolution segmentation predictions for supervision, thereby enhancing the discriminative capacity of shallow features. In addition, to further improve the discriminability of the embedding representations, we propose a cross-layer class embedding Fisher Discriminative Loss to enforce intra-class compactness and enlarge inter-class separability. Extensive experiments on the LoveDA, Vaihingen, and Potsdam datasets demonstrate that BiCoR-Seg achieves outstanding segmentation performance while offering stronger interpretability. The released code is available at https://github.com/ShiJinghao566/BiCoR-Seg.