🤖 AI Summary
Existing scene change detection (SCD) methods exhibit severe generalization failure under cross-domain and cross-temporal settings, suffering up to a 73-percentage-point drop in in-domain performance. To address this, we propose GeSCF—the first generalization-oriented SCD framework—and GeSCD, its accompanying benchmark. Our core contributions are: (1) SAM-based zero-shot pseudo-mask generation; (2) a geometric-semantic joint mask matching mechanism; and (3) ChangeVPR, the first challenging cross-environment SCD dataset featuring substantial domain shifts and temporal inconsistencies. We further introduce a novel generalization evaluation protocol explicitly designed to assess robustness against both domain shift and temporal misalignment. Extensive experiments demonstrate that GeSCF achieves an average +19.2% improvement across mainstream SCD benchmarks and a +30.0% gain on ChangeVPR—nearly doubling baseline performance—thereby significantly enhancing model robustness and cross-domain transferability.
📝 Abstract
While current state-of-the-art Scene Change Detection (SCD) approaches achieve impressive results in well-trained research data, they become unreliable under unseen environments and different temporal conditions; in-domain performance drops from 77.6% to 8.0% in a previously unseen environment and to 4.6% under a different temporal condition -- calling for generalizable SCD and benchmark. In this work, we propose the Generalizable Scene Change Detection Framework (GeSCF), which addresses unseen domain performance and temporal consistency -- to meet the growing demand for anything SCD. Our method leverages the pre-trained Segment Anything Model (SAM) in a zero-shot manner. For this, we design Initial Pseudo-mask Generation and Geometric-Semantic Mask Matching -- seamlessly turning user-guided prompt and single-image based segmentation into scene change detection for a pair of inputs without guidance. Furthermore, we define the Generalizable Scene Change Detection (GeSCD) benchmark along with novel metrics and an evaluation protocol to facilitate SCD research in generalizability. In the process, we introduce the ChangeVPR dataset, a collection of challenging image pairs with diverse environmental scenarios -- including urban, suburban, and rural settings. Extensive experiments across various datasets demonstrate that GeSCF achieves an average performance gain of 19.2% on existing SCD datasets and 30.0% on the ChangeVPR dataset, nearly doubling the prior art performance. We believe our work can lay a solid foundation for robust and generalizable SCD research.