🤖 AI Summary
Urban Local Climate Zone (LCZ) fine-grained classification is hindered by substantial physical disparities between Synthetic Aperture Radar (SAR) and multispectral remote sensing data, leading to inefficient cross-modal feature fusion and limited accuracy.
Method: This paper proposes a physics-aware multimodal fusion framework. It introduces the novel Band-Guided Prompt (BGP) mechanism to enable group-level vision–language alignment, and designs a Multi-variable Supervision Matrix (MSM) training strategy to mitigate positive–negative sample confusion. The framework integrates text-prompt-guided learning, collaborative encoding of multi-source remote sensing data, and MSM-optimized supervision.
Results: Extensive experiments across multiple urban regions demonstrate significant improvements in LCZ classification accuracy, achieving state-of-the-art (SOTA) performance. The results validate both the effectiveness of physics-informed fusion and the framework’s strong cross-regional generalization capability.
📝 Abstract
Local climate zone (LCZ) classification is of great value for understanding the complex interactions between urban development and local climate. Recent studies have increasingly focused on the fusion of synthetic aperture radar (SAR) and multi-spectral data to improve LCZ classification performance. However, it remains challenging due to the distinct physical properties of these two types of data and the absence of effective fusion guidance. In this paper, a novel band prompting aided data fusion framework is proposed for LCZ classification, namely BP-LCZ, which utilizes textual prompts associated with band groups to guide the model in learning the physical attributes of different bands and semantics of various categories inherent in SAR and multi-spectral data to augment the fused feature, thus enhancing LCZ classification performance. Specifically, a band group prompting (BGP) strategy is introduced to align the visual representation effectively at the level of band groups, which also facilitates a more adequate extraction of semantic information of different bands with textual information. In addition, a multivariate supervised matrix (MSM) based training strategy is proposed to alleviate the problem of positive and negative sample confusion by completing the supervised information. The experimental results demonstrate the effectiveness and superiority of the proposed data fusion framework.