🤖 AI Summary
Multi-temporal remote sensing change detection suffers from fine-grained recognition difficulties caused by heterogeneity and spatiotemporal misalignment, while existing sequential modeling approaches often compromise local structural consistency. To address this, we propose a structure-aware interleaved state-space modeling framework. First, we introduce a novel checkerboard serpentine scanning strategy that preserves local structural integrity across multi-temporal features and enables single-pass forward alignment. Second, a multi-dilation convolution fusion module explicitly captures center-to-corner contextual relationships, enhancing robustness to misalignment. Third, we integrate a SpatialMamba encoder with a lightweight cross-source interaction module for efficient heterogeneous temporal feature fusion. Our method achieves state-of-the-art performance on binary change detection, semantic change detection, and multimodal building damage assessment—demonstrating significant improvements in change localization accuracy and cross-scenario generalization.
📝 Abstract
Change detection (CD) in multitemporal remote sensing imagery presents significant challenges for fine-grained recognition, owing to heterogeneity and spatiotemporal misalignment. However, existing methodologies based on vision transformers or state-space models typically disrupt local structural consistency during temporal serialization, obscuring discriminative cues under misalignment and hindering reliable change localization. To address this, we introduce ChessMamba, a structure-aware framework leveraging interleaved state-space modeling for robust CD with multi-temporal inputs. ChessMamba integrates a SpatialMamba encoder with a lightweight cross-source interaction module, featuring two key innovations: (i) Chessboard interleaving with snake scanning order, which serializes multi-temporal features into a unified sequence within a single forward pass, thereby shortening interaction paths and enabling direct comparison for accurate change localization; and (ii) Structure-aware fusion via multi-dilated convolutions, selectively capturing center-and-corner neighborhood contexts within each mono-temporal. Comprehensive evaluations on three CD tasks, including binary CD, semantic CD and multimodal building damage assessment, demonstrate that ChessMamba effectively fuses heterogeneous features and achieves substantial accuracy improvements over state-of-the-art methods.The relevant code will be available at: github.com/DingLei14/ChessMamba.