🤖 AI Summary
To address two key bottlenecks in remote sensing semantic change detection (SCD)—limited class diversity and insufficient granularity in existing datasets, and inadequate modeling of semantic change information—this paper introduces LevirSCD, the first high-granularity SCD benchmark comprising 16 coarse-grained classes and 210 fine-grained subclasses. We further propose FoBa, a foreground-background co-guided framework featuring dual-path feature encoding and a gated interaction fusion (GIF) module to enable deep spatiotemporal feature interaction, along with a consistency loss to enhance spatial robustness. FoBa effectively mitigates semantic ambiguity and significantly improves detection of subtle changes. Experimental results demonstrate state-of-the-art performance: FoBa achieves SeK improvements of +1.48%, +3.61%, and +2.81% over prior methods on SECOND, JL1, and LevirSCD, respectively—validating the effectiveness of synergistic innovation in both benchmark construction and model design.
📝 Abstract
Despite the remarkable progress achieved in remote sensing semantic change detection (SCD), two major challenges remain. At the data level, existing SCD datasets suffer from limited change categories, insufficient change types, and a lack of fine-grained class definitions, making them inadequate to fully support practical applications. At the methodological level, most current approaches underutilize change information, typically treating it as a post-processing step to enhance spatial consistency, which constrains further improvements in model performance. To address these issues, we construct a new benchmark for remote sensing SCD, LevirSCD. Focused on the Beijing area, the dataset covers 16 change categories and 210 specific change types, with more fine-grained class definitions (e.g., roads are divided into unpaved and paved roads). Furthermore, we propose a foreground-background co-guided SCD (FoBa) method, which leverages foregrounds that focus on regions of interest and backgrounds enriched with contextual information to guide the model collaboratively, thereby alleviating semantic ambiguity while enhancing its ability to detect subtle changes. Considering the requirements of bi-temporal interaction and spatial consistency in SCD, we introduce a Gated Interaction Fusion (GIF) module along with a simple consistency loss to further enhance the model's detection performance. Extensive experiments on three datasets (SECOND, JL1, and the proposed LevirSCD) demonstrate that FoBa achieves competitive results compared to current SOTA methods, with improvements of 1.48%, 3.61%, and 2.81% in the SeK metric, respectively. Our code and dataset are available at https://github.com/zmoka-zht/FoBa.