🤖 AI Summary
Existing remote sensing change detection models rely on binary change map supervision, overemphasizing pixel-level differences while lacking semantic understanding—leading to insufficient robustness against noise and illumination variations. To address this, we propose SA-CDNet, a Semantic-Aware Change Detection Network, which introduces the first joint modeling paradigm integrating appearance discrepancy and scene semantics. Specifically, it incorporates human vision-inspired semantic priors, a dual-stream decoder, a discrepancy-aware branch, and an auxiliary semantic segmentation branch. We further propose a single-temporal semantic pretraining strategy: leveraging off-the-shelf semantic segmentation datasets to synthesize pseudo-change samples, thereby enabling co-optimization of semantic understanding and change detection. Extensive experiments demonstrate that SA-CDNet achieves state-of-the-art performance across five mainstream remote sensing benchmarks. The source code is publicly available.
📝 Abstract
When given two similar images, humans identify their differences by comparing the appearance ({it e.g., color, texture}) with the help of semantics ({it e.g., objects, relations}). However, mainstream change detection models adopt a supervised training paradigm, where the annotated binary change map is the main constraint. Thus, these methods primarily emphasize the difference-aware features between bi-temporal images and neglect the semantic understanding of the changed landscapes, which undermines the accuracy in the presence of noise and illumination variations. To this end, this paper explores incorporating semantic priors to improve the ability to detect changes. Firstly, we propose a Semantic-Aware Change Detection network, namely SA-CDNet, which transfers the common knowledge of the visual foundation models ({it i.e., FastSAM}) to change detection. Inspired by the human visual paradigm, a novel dual-stream feature decoder is derived to distinguish changes by combining semantic-aware features and difference-aware features. Secondly, we design a single-temporal semantic pre-training strategy to enhance the semantic understanding of landscapes, which brings further increments. Specifically, we construct pseudo-change detection data from public single-temporal remote sensing segmentation datasets for large-scale pre-training, where an extra branch is also introduced for the proxy semantic segmentation task. Experimental results on five challenging benchmarks demonstrate the superiority of our method over the existing state-of-the-art methods. The code is available at href{https://github.com/thislzm/SA-CD}{SA-CD}.