🤖 AI Summary
Mamba-based architectures exhibit strong global modeling capability but weak local detail perception in binary change detection for remote sensing imagery, limiting dense prediction accuracy. To address this, we propose a global-local collaborative modeling framework: (1) the Spatial-Recursive Convolutional Module (SRCM) integrates Mamba’s state-space modeling with convolutional operators’ localized receptive fields; (2) the Adaptive Global-Local Guidance Fusion (AGLGF) module enables adaptive, cross-temporal fusion of global and local features. Evaluated on LEVIR-CD and CLCD benchmarks, our method achieves new state-of-the-art performance—improving F1 score by 2.10% and 2.44%, and IoU by 3.00% and 2.91%, respectively. This work constitutes the first systematic integration of explicit local perception mechanisms into the Mamba architecture, establishing a novel paradigm for state-space model–based remote sensing change detection.
📝 Abstract
Recently, the Mamba architecture based on state-space models has demonstrated remarkable performance in a series of natural language processing tasks and has been rapidly applied to remote sensing change detection (CD) tasks. However, most methods enhance the global receptive field by directly modifying the scanning mode of Mamba, neglecting the crucial role that local information plays in dense prediction tasks (e.g., binary CD). In this article, we propose a model called CDMamba, which effectively combines global and local features for handling binary CD tasks. Specifically, the scaled residual ConvMamba (SRCM) block is proposed to utilize the ability of Mamba to extract global features and convolution to enhance the local details, to alleviate the issue that current Mamba-based methods lack detailed clues and are difficult to achieve fine detection in dense prediction tasks. Furthermore, considering the characteristics of bi-temporal feature interaction required for CD, the adaptive global–local guided fusion (AGLGF) block is proposed to dynamically facilitate the bi-temporal interaction guided by other temporal global/local features. Our intuition is that more discriminative change features can be acquired with the guidance of other temporal features. Extensive experiments on five datasets demonstrate that our proposed CDMamba is comparable to the current methods (such as the F1/intersection over union (IoU) scores are improved by 2.10%/3.00%, 2.44%/2.91%, on LEVIR+CD and CLCD, respectively). Our code is open-sourced at https://github.com/zmoka-zht/CDMamba.