M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-modal change detection between optical and synthetic aperture radar (SAR) remote sensing imagery remains challenging in extreme scenarios such as disaster emergency response, primarily due to the substantial distributional heterogeneity between these modalities—rendering conventional shared-weight Siamese networks inadequate. Method: We propose an Optical-to-SAR Guided Pathway (O2SP) coupled with a self-distillation mechanism, integrated with a gated Mixture-of-Experts (MoE) module to explicitly model multi-modal feature distributions. The framework supports dual backbones—both CNN- and Transformer-based—including MiT-b1—to jointly achieve cross-modal alignment and knowledge transfer. Contribution/Results: On optical–SAR change detection benchmarks, the MiT-b1 variant consistently outperforms all state-of-the-art methods, delivering significant improvements in both accuracy and robustness. This work establishes a novel paradigm for remote sensing monitoring under extreme conditions.

Technology Category

Application Category

📝 Abstract
Most existing change detection (CD) methods focus on optical images captured at different times, and deep learning (DL) has achieved remarkable success in this domain. However, in extreme scenarios such as disaster response, synthetic aperture radar (SAR), with its active imaging capability, is more suitable for providing post-event data. This introduces new challenges for CD methods, as existing weight-sharing Siamese networks struggle to effectively learn the cross-modal data distribution between optical and SAR images. To address this challenge, we propose a unified MultiModal CD framework, M$^2$CD. We integrate Mixture of Experts (MoE) modules into the backbone to explicitly handle diverse modalities, thereby enhancing the model's ability to learn multimodal data distributions. Additionally, we innovatively propose an Optical-to-SAR guided path (O2SP) and implement self-distillation during training to reduce the feature space discrepancy between different modalities, further alleviating the model's learning burden. We design multiple variants of M$^2$CD based on both CNN and Transformer backbones. Extensive experiments validate the effectiveness of the proposed framework, with the MiT-b1 version of M$^2$CD outperforming all state-of-the-art (SOTA) methods in optical-SAR CD tasks.
Problem

Research questions and friction points this paper is trying to address.

Handles optical-SAR cross-modal change detection challenges
Improves multimodal data distribution learning via MoE
Reduces feature space discrepancy with self-distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts for multimodal learning
Optical-to-SAR guided path (O2SP)
Self-distillation to reduce feature discrepancy
🔎 Similar Papers
No similar papers found.
Ziyuan Liu
Ziyuan Liu
Unknown affiliation
RoboticsManipulation and GraspingComputer VisionMachine Learning
J
Jiawei Zhang
Department of Electronic Engineering, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
W
Wenyu Wang
College of Communications Engineering, Army Engineering University of PLA, Nanjing 210007, China
Yuantao Gu
Yuantao Gu
Department of Electronic Engineering, Tsinghua University
Signal processingSparse recoverySparse learningOptimizationGraph signal processing