🤖 AI Summary
Addressing the challenges of high spatiotemporal complexity, strong cross-modal heterogeneity, and noise sensitivity in unsupervised change detection (UCD) for multimodal remote sensing imagery, this paper proposes a Semantic-to-Change (S2C) learning framework. The framework pioneers the transfer of implicit semantic knowledge from vision foundation models (VFMs) into the change representation space, explicitly modeling temporal differences via triplet contrastive learning. To enhance noise robustness, it introduces spatial-spectral joint random perturbation. Furthermore, grid-based sparse regularization and an IoU-matching algorithm are designed to refine change map accuracy. Fully unsupervised, S2C is compatible with diverse VFMs and backbone architectures. Evaluated on four benchmark datasets, it achieves an average F1-score gain of 17.3% over state-of-the-art methods, with a maximum improvement of 31.0%, demonstrating substantial advances in robustness and sample efficiency.
📝 Abstract
Unsupervised Change Detection (UCD) in multimodal Remote Sensing (RS) images remains a difficult challenge due to the inherent spatio-temporal complexity within data, and the heterogeneity arising from different imaging sensors. Inspired by recent advancements in Visual Foundation Models (VFMs) and Contrastive Learning (CL) methodologies, this research aims to develop CL methodologies to translate implicit knowledge in VFM into change representations, thus eliminating the need for explicit supervision. To this end, we introduce a Semantic-to-Change (S2C) learning framework for UCD in both homogeneous and multimodal RS images. Differently from existing CL methodologies that typically focus on learning multi-temporal similarities, we introduce a novel triplet learning strategy that explicitly models temporal differences, which are crucial to the CD task. Furthermore, random spatial and spectral perturbations are introduced during the training to enhance robustness to temporal noise. In addition, a grid sparsity regularization is defined to suppress insignificant changes, and an IoU-matching algorithm is developed to refine the CD results. Experiments on four benchmark CD datasets demonstrate that the proposed S2C learning framework achieves significant improvements in accuracy, surpassing current state-of-the-art by over 31%, 9%, 23%, and 15%, respectively. It also demonstrates robustness and sample efficiency, suitable for training and adaptation of various Visual Foundation Models (VFMs) or backbone neural networks. The relevant code will be available at: github.com/DingLei14/S2C.