🤖 AI Summary
Manual registration of misaligned multi-temporal remote sensing images severely hampers automated change detection. Method: We propose MatchCD, an end-to-end self-supervised framework that jointly models geometric deformation estimation and semantic change detection. It introduces a geometry-estimation-driven self-supervised pretraining paradigm enabling direct end-to-end inference on native large-scale imagery (up to 6K×4K) without tiling or manual intervention. Contrastive learning acquires generalizable representations, which are then zero-shot transferred to the joint registration–change-detection optimization task. Contribution/Results: MatchCD achieves superior robustness and accuracy under severe geometric distortions and complex real-world conditions, fully eliminating the need for manual registration. It advances change detection toward unified, fully automated pipelines.
📝 Abstract
As an essential procedure in earth observation system, change detection (CD) aims to reveal the spatial-temporal evolution of the observation regions. A key prerequisite for existing change detection algorithms is aligned geo-references between multi-temporal images by fine-grained registration. However, in the majority of real-world scenarios, a prior manual registration is required between the original images, which significantly increases the complexity of the CD workflow. In this paper, we proposed a self-supervision motivated CD framework with geometric estimation, called"MatchCD". Specifically, the proposed MatchCD framework utilizes the zero-shot capability to optimize the encoder with self-supervised contrastive representation, which is reused in the downstream image registration and change detection to simultaneously handle the bi-temporal unalignment and object change issues. Moreover, unlike the conventional change detection requiring segmenting the full-frame image into small patches, our MatchCD framework can directly process the original large-scale image (e.g., 6K*4K resolutions) with promising performance. The performance in multiple complex scenarios with significant geometric distortion demonstrates the effectiveness of our proposed framework.