ChangeMamba: Remote Sensing Change Detection With Spatiotemporal State Space Model

📅 2024-04-04
🏛️ IEEE Transactions on Geoscience and Remote Sensing
📈 Citations: 34
Influential: 2
📄 PDF
🤖 AI Summary
To address the limited receptive fields of CNNs and high computational overhead of Transformers in remote sensing change detection, this paper pioneers the introduction of Visual Mamba—a state-space model-based architecture—into this domain, proposing the ChangeMamba framework family. Methodologically, it designs a task-specific encoder and a novel spatiotemporal relational modeling decoder, supporting three distinct tasks: binary change detection (MambaBCD), semantic change detection (MambaSCD), and building damage assessment (MambaBDA). Crucially, it introduces a multi-temporal feature interaction mechanism explicitly engineered for seamless integration with Mamba, enabling efficient global contextual modeling. Extensive evaluations across five benchmark datasets demonstrate consistent superiority over state-of-the-art CNN- and Transformer-based methods. The framework exhibits robustness to image degradation and requires only simple, lightweight training. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Convolutional neural networks (CNNs) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN is constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on large datasets. Recently, the Mamba architecture, based on state space models (SSMs), has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this article, we explore for the first time the potential of the Mamba architecture for remote sensing CD tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary CD (BCD), semantic CD (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge Visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatiotemporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatiotemporal interaction of multitemporal features, thereby obtaining accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex training strategies or tricks, fully demonstrating the potential of the Mamba architecture in CD tasks. Further experiments show that our architecture is quite robust to degraded data. The source code is available at: https://github.com/ChenHongruixuan/MambaCD.
Problem

Research questions and friction points this paper is trying to address.

Remote Sensing
Convolutional Neural Networks
Transformer Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

ChangeMamba Framework
Mamba Architecture
Temporal Spatial Model
🔎 Similar Papers
No similar papers found.
Hongruixuan Chen
Hongruixuan Chen
The University of Tokyo, RIKEN
Deep LearningComputer VisionGeoAIAI4EOMultimodal Remote Sensing
J
Jian Song
Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
C
Chengxi Han
State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
Junshi Xia
Junshi Xia
RIKEN AIP
Machine LearningClassificationRemote Sensing
Naoto Yokoya
Naoto Yokoya
The University of Tokyo, RIKEN
Remote SensingComputer VisionMachine LearningData Fusion