🤖 AI Summary
Remote sensing change detection (RSCD) faces a key challenge under mask supervision: while spatial localization is accurate, semantic transition modeling between bi-temporal images remains inadequate due to temporal semantic inconsistency. To address this, we propose TaCo, the first network that integrates a text-guided transition generator with joint spatiotemporal semantic constraints to explicitly model semantic transformation relationships between bi-temporal features. Our method fuses textual semantics with bi-temporal visual representations and introduces dual feedback reconstruction and transitional discrimination constraints—enhancing semantic consistency without increasing inference overhead. Evaluated on six public benchmarks covering both binary and semantic change detection tasks, TaCo achieves state-of-the-art performance across all datasets, significantly outperforming existing methods.
📝 Abstract
Remote sensing change detection (RSCD) aims to identify surface changes across bi-temporal satellite images. Most previous methods rely solely on mask supervision, which effectively guides spatial localization but provides limited constraints on the temporal semantic transitions. Consequently, they often produce spatially coherent predictions while still suffering from unresolved semantic inconsistencies. To address this limitation, we propose TaCo, a spatio-temporal semantic consistent network, which enriches the existing mask-supervised framework with a spatio-temporal semantic joint constraint. TaCo conceptualizes change as a semantic transition between bi-temporal states, in which one temporal feature representation can be derived from the other via dedicated transition features. To realize this, we introduce a Text-guided Transition Generator that integrates textual semantics with bi-temporal visual features to construct the cross-temporal transition features. In addition, we propose a spatio-temporal semantic joint constraint consisting of bi-temporal reconstruct constraints and a transition constraint: the former enforces alignment between reconstructed and original features, while the latter enhances discrimination for changes. This design can yield substantial performance gains without introducing any additional computational overhead during inference. Extensive experiments on six public datasets, spanning both binary and semantic change detection tasks, demonstrate that TaCo consistently achieves SOTA performance.