Multi-Modal Building Change Detection for Large-Scale Small Changes: Benchmark and Baseline

πŸ“… 2026-03-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge of building change detection in optical remote sensing imagery, where variations in illumination, seasonal conditions, and surface materials often hinder accurate identification of subtle structural changes using RGB data alone. To tackle this issue, the authors introduce LSMD, the first large-scale, high-resolution, and precisely co-registered multimodal bitemporal benchmark dataset for building change detection. They further propose the Multimodal Spectral Complementary Network (MSCNet), which leverages neighborhood context enhancement, cross-modal alignment interaction, and saliency-aware multisource optimization to fully exploit the heterogeneous complementarity between RGB and near-infrared modalities. Experimental results demonstrate that MSCNet significantly outperforms existing methods on LSMD, achieving superior accuracy and robustness in fine-grained building change detection under complex real-world scenarios.

Technology Category

Application Category

πŸ“ Abstract
Change detection in optical remote sensing imagery is susceptible to illumination fluctuations, seasonal changes, and variations in surface land-cover materials. Relying solely on RGB imagery often produces pseudo-changes and leads to semantic ambiguity in features. Incorporating near-infrared (NIR) information provides heterogeneous physical cues that are complementary to visible light, thereby enhancing the discriminability of building materials and tiny structures while improving detection accuracy. However, existing multi-modal datasets generally lack high-resolution and accurately registered bi-temporal imagery, and current methods often fail to fully exploit the inherent heterogeneity between these modalities. To address these issues, we introduce the Large-scale Small-change Multi-modal Dataset (LSMD), a bi-temporal RGB-NIR building change detection benchmark dataset targeting small changes in realistic scenarios, providing a rigorous testing platform for evaluating multi-modal change detection methods in complex environments. Based on LSMD, we further propose the Multi-modal Spectral Complementarity Network (MSCNet) to achieve effective cross-modal feature fusion. MSCNet comprises three key components: the Neighborhood Context Enhancement Module (NCEM) to strengthen local spatial details, the Cross-modal Alignment and Interaction Module (CAIM) to enable deep interaction between RGB and NIR features, and the Saliency-aware Multisource Refinement Module (SMRM) to progressively refine fused features. Extensive experiments demonstrate that MSCNet effectively leverages multi-modal information and consistently outperforms existing methods under multiple input configurations, validating its efficacy for fine-grained building change detection. The source code will be made publicly available at: https://github.com/AeroVILab-AHU/LSMD
Problem

Research questions and friction points this paper is trying to address.

change detection
multi-modal
building
small changes
remote sensing
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal change detection
RGB-NIR fusion
building change detection
cross-modal alignment
small-change detection
πŸ”Ž Similar Papers
No similar papers found.
Y
Ye Wang
MOE Key Lab of ICSP, Anhui Provincial Key Lab of Multimodal Cognitive Computation, IMIS Lab of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei, China
Wei Lu
Wei Lu
Professor and Chair of Computer Science, Keene State College/USNH
cybersecuritydata scienceartificial intelligence
Z
Zhihui You
School of Public Safety and Emergency Management, Anhui University of Science and Technology, Hefei 231131, China
K
Keyan Chen
College of Computing and Data Science, Nanyang Technological University, Singapore 639798
T
Tongfei Liu
Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China
Kaiyu Li
Kaiyu Li
Wilfrid Laurier University, Canada
Data governance and Data preparationData market and Data economy
Hongruixuan Chen
Hongruixuan Chen
The University of Tokyo, RIKEN
Deep LearningComputer VisionGeoAIAI4EOMultimodal Remote Sensing
Q
Qingling Shu
MOE Key Lab of ICSP, Anhui Provincial Key Lab of Multimodal Cognitive Computation, IMIS Lab of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei, China
S
Sibao Chen
MOE Key Lab of ICSP, Anhui Provincial Key Lab of Multimodal Cognitive Computation, IMIS Lab of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei, China