Prior-guided Fusion of Multimodal Features for Change Detection from Optical-SAR Images

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of insufficient cross-modal interaction and underutilized modality-specific features in multimodal change detection using optical and SAR remote sensing imagery. To this end, we propose STSF-Net, a novel framework that jointly models modality-specific characteristics alongside spatio-temporal commonalities, complemented by an adaptive fusion mechanism leveraging semantic priors from pretrained foundation models. Our key contributions include the creation of Delta-SN6, the first high-resolution, fully polarimetric SAR and optical image benchmark dataset for multiclass change detection, and the development of a semantics-guided strategy for efficient multimodal fusion. Experimental results demonstrate that our method achieves state-of-the-art performance, surpassing existing approaches by 3.21%, 1.08%, and 1.32% in mIoU on the Delta-SN6, BRIGHT, and Wuhan-Het datasets, respectively. Both the code and dataset are publicly released.
📝 Abstract
Multimodal change detection (MMCD) identifies changed areas in multimodal remote sensing (RS) data, demonstrating significant application value in land use monitoring, disaster assessment, and urban sustainable development. However, literature MMCD approaches exhibit limitations in cross-modal interaction and exploiting modality-specific characteristics. This leads to insufficient modeling of fine-grained change information, thus hindering the precise detection of semantic changes in multimodal data. To address the above problems, we propose STSF-Net, a framework designed for MMCD between optical and SAR images. STSF-Net jointly models modality-specific and spatio-temporal common features to enhance change representations. Specifically, modality-specific features are exploited to capture genuine semantic change signals, while spatio-temporal common features are embedded to suppress pseudo-changes caused by differences in imaging mechanisms. Furthermore, we introduce an optical and SAR feature fusion strategy that adaptively adjusts feature importance based on semantic priors obtained from pre-trained foundational models, enabling semantic-guided adaptive fusion of multi-modal information. In addition, we introduce the Delta-SN6 dataset, the first openly-accessible multiclass MMCD benchmark consisting of very-high-resolution (VHR) fully polarimetric SAR and optical images. Experimental results on Delta-SN6, BRIGHT, and Wuhan-Het datasets demonstrate that our method outperforms the state-of-the-art (SOTA) by 3.21%, 1.08%, and 1.32% in mIoU, respectively. The associated code and Delta-SN6 dataset will be released at: https://github.com/liuxuanguang/STSF-Net.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Change Detection
Optical-SAR Images
Cross-modal Interaction
Modality-specific Characteristics
Semantic Change Detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal change detection
semantic-guided fusion
modality-specific features
spatio-temporal common features
foundation model priors
🔎 Similar Papers
No similar papers found.
X
Xuanguang Liu
Institute of Geospatial Information, Information Engineering University, Zhengzhou, China
L
Lei Ding
Institute of Geospatial Information, Information Engineering University, Zhengzhou, China
Y
Yujie Li
Institute of Geospatial Information, Information Engineering University, Zhengzhou, China
C
Chenguang Dai
Institute of Geospatial Information, Information Engineering University, Zhengzhou, China
Z
Zhenchao Zhang
Institute of Geospatial Information, Information Engineering University, Zhengzhou, China
Mengmeng Li
Mengmeng Li
Fuzhou University
Remote sensing image analysisVHR optical(Polarimetric) SARTime seriesUncertainty handling
Z
Ziyi Yang
Institute of Geospatial Information, Information Engineering University, Zhengzhou, China
Y
Yifan Sun
Institute of Geospatial Information, Information Engineering University, Zhengzhou, China
Y
Yongqi Sun
Institute of Geospatial Information, Information Engineering University, Zhengzhou, China
Hanyun Wang
Hanyun Wang
National University of Defense Technology
computer visionpattern recognition3D point cloud processingimage processing