🤖 AI Summary
Unaligned scene change detection (SCD) faces challenges including failure of appearance matching under large viewpoint discrepancies, difficulty in modeling occlusions, and lack of geometric reasoning. To address these, we propose the first training-free vision–geometry collaborative framework that jointly leverages geometric foundation models (GBMs) and visual foundation models (VBMs), explicitly modeling cross-view correspondences, visual overlap regions, and occlusion states. By incorporating 3D geometric priors, our approach overcomes the generalization bottleneck inherent in purely 2D supervision, enabling robust cross-image alignment and precise change localization. Evaluated on three major unaligned benchmarks—PSCD, ChangeSim, and PASLCD—our method consistently outperforms existing state-of-the-art approaches. Results demonstrate that geometric priors are critical for enhancing both robustness and interpretability in SCD, establishing a new paradigm for geometry-aware change understanding.
📝 Abstract
Unaligned Scene Change Detection aims to detect scene changes between image pairs captured at different times without assuming viewpoint alignment. To handle viewpoint variations, current methods rely solely on 2D visual cues to establish cross-image correspondence to assist change detection. However, large viewpoint changes can alter visual observations, causing appearance-based matching to drift or fail. Additionally, supervision limited to 2D change masks from small-scale SCD datasets restricts the learning of generalizable multi-view knowledge, making it difficult to reliably identify visual overlaps and handle occlusions. This lack of explicit geometric reasoning represents a critical yet overlooked limitation. In this work, we are the first to leverage geometric priors from a Geometric Foundation Model to address the core challenges of unaligned SCD, including reliable identification of visual overlaps, robust correspondence establishment, and explicit occlusion detection. Building on these priors, we propose a training-free framework that integrates them with the powerful representations of a visual foundation model to enable reliable change detection under viewpoint misalignment. Through extensive evaluation on the PSCD, ChangeSim, and PASLCD datasets, we demonstrate that our approach achieves superior and robust performance. Our code will be released at https://github.com/ZilingLiu/GeoSCD.