🤖 AI Summary
This work addresses the limitations of existing remote sensing semantic change detection methods, which suffer from insufficient semantic understanding and redundant paradigms. To overcome these challenges, we propose an efficient framework built upon the remote sensing foundation model PerA, featuring a modular cascaded gated decoder (CG-Decoder) that streamlines the detection pipeline and enhances multi-scale feature interaction. Additionally, we introduce a soft semantic consistency loss (SSCLoss) to improve training stability and cross-encoder adaptability. The proposed approach substantially simplifies conventional architectures while achieving state-of-the-art performance on two public remote sensing datasets, demonstrating its effectiveness, generalizability, and compatibility with diverse visual encoders.
📝 Abstract
Remote sensing (RS) change detection methods can extract critical information on surface dynamics and are an essential means for humans to understand changes in the earth's surface and environment. Among these methods, semantic change detection (SCD) can more effectively interpret the multi-class information contained in bi-temporal RS imagery, providing semantic-level predictions that support dynamic change monitoring. However, due to the limited semantic understanding capability of the model and the inherent complexity of the SCD tasks, existing SCD methods face significant challenges in both performance and paradigm complexity. In this paper, we propose PerASCD, a SCD method driven by RS foundation model PerA, designed to enhance the multi-scale semantic understanding and overall performance. We introduce a modular Cascaded Gated Decoder (CG-Decoder) that simplifies complex SCD decoding pipelines while promoting effective multi-level feature interaction and fusion. In addition, we propose a Soft Semantic Consistency Loss (SSCLoss) to mitigate the numerical instability commonly encountered during SCD training. We further explore the applicability of multiple existing RS foundation models on the SCD task when equipped with the proposed decoder. Experimental results demonstrate that our decoder not only effectively simplifies the paradigm of SCD, but also achieves seamless adaptation across various vision encoders. Our method achieves state-of-the-art (SOTA) performance on two public benchmark datasets, validating its effectiveness. The code is available at https://github.com/SathShen/PerASCD.git.