π€ AI Summary
This study addresses the challenge of building change detection in optical remote sensing imagery, where variations in illumination, seasonal conditions, and surface materials often hinder accurate identification of subtle structural changes using RGB data alone. To tackle this issue, the authors introduce LSMD, the first large-scale, high-resolution, and precisely co-registered multimodal bitemporal benchmark dataset for building change detection. They further propose the Multimodal Spectral Complementary Network (MSCNet), which leverages neighborhood context enhancement, cross-modal alignment interaction, and saliency-aware multisource optimization to fully exploit the heterogeneous complementarity between RGB and near-infrared modalities. Experimental results demonstrate that MSCNet significantly outperforms existing methods on LSMD, achieving superior accuracy and robustness in fine-grained building change detection under complex real-world scenarios.
π Abstract
Change detection in optical remote sensing imagery is susceptible to illumination fluctuations, seasonal changes, and variations in surface land-cover materials. Relying solely on RGB imagery often produces pseudo-changes and leads to semantic ambiguity in features. Incorporating near-infrared (NIR) information provides heterogeneous physical cues that are complementary to visible light, thereby enhancing the discriminability of building materials and tiny structures while improving detection accuracy. However, existing multi-modal datasets generally lack high-resolution and accurately registered bi-temporal imagery, and current methods often fail to fully exploit the inherent heterogeneity between these modalities. To address these issues, we introduce the Large-scale Small-change Multi-modal Dataset (LSMD), a bi-temporal RGB-NIR building change detection benchmark dataset targeting small changes in realistic scenarios, providing a rigorous testing platform for evaluating multi-modal change detection methods in complex environments. Based on LSMD, we further propose the Multi-modal Spectral Complementarity Network (MSCNet) to achieve effective cross-modal feature fusion. MSCNet comprises three key components: the Neighborhood Context Enhancement Module (NCEM) to strengthen local spatial details, the Cross-modal Alignment and Interaction Module (CAIM) to enable deep interaction between RGB and NIR features, and the Saliency-aware Multisource Refinement Module (SMRM) to progressively refine fused features. Extensive experiments demonstrate that MSCNet effectively leverages multi-modal information and consistently outperforms existing methods under multiple input configurations, validating its efficacy for fine-grained building change detection. The source code will be made publicly available at: https://github.com/AeroVILab-AHU/LSMD