🤖 AI Summary
This work addresses the challenges of insufficient cross-modal interaction and underutilized modality-specific features in multimodal change detection using optical and SAR remote sensing imagery. To this end, we propose STSF-Net, a novel framework that jointly models modality-specific characteristics alongside spatio-temporal commonalities, complemented by an adaptive fusion mechanism leveraging semantic priors from pretrained foundation models. Our key contributions include the creation of Delta-SN6, the first high-resolution, fully polarimetric SAR and optical image benchmark dataset for multiclass change detection, and the development of a semantics-guided strategy for efficient multimodal fusion. Experimental results demonstrate that our method achieves state-of-the-art performance, surpassing existing approaches by 3.21%, 1.08%, and 1.32% in mIoU on the Delta-SN6, BRIGHT, and Wuhan-Het datasets, respectively. Both the code and dataset are publicly released.
📝 Abstract
Multimodal change detection (MMCD) identifies changed areas in multimodal remote sensing (RS) data, demonstrating significant application value in land use monitoring, disaster assessment, and urban sustainable development. However, literature MMCD approaches exhibit limitations in cross-modal interaction and exploiting modality-specific characteristics. This leads to insufficient modeling of fine-grained change information, thus hindering the precise detection of semantic changes in multimodal data. To address the above problems, we propose STSF-Net, a framework designed for MMCD between optical and SAR images. STSF-Net jointly models modality-specific and spatio-temporal common features to enhance change representations. Specifically, modality-specific features are exploited to capture genuine semantic change signals, while spatio-temporal common features are embedded to suppress pseudo-changes caused by differences in imaging mechanisms. Furthermore, we introduce an optical and SAR feature fusion strategy that adaptively adjusts feature importance based on semantic priors obtained from pre-trained foundational models, enabling semantic-guided adaptive fusion of multi-modal information. In addition, we introduce the Delta-SN6 dataset, the first openly-accessible multiclass MMCD benchmark consisting of very-high-resolution (VHR) fully polarimetric SAR and optical images. Experimental results on Delta-SN6, BRIGHT, and Wuhan-Het datasets demonstrate that our method outperforms the state-of-the-art (SOTA) by 3.21%, 1.08%, and 1.32% in mIoU, respectively. The associated code and Delta-SN6 dataset will be released at: https://github.com/liuxuanguang/STSF-Net.