Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms

📅 2024-09-25
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address false positives in change detection caused by illumination, seasonal, and viewpoint variations in complex scenes, this paper proposes a robust bi-temporal matching method based on DINOv2. The approach freezes the DINOv2 backbone to preserve its general-purpose visual representation capability, introduces a global cross-attention mechanism to model pixel-level correspondences across images, and designs a multi-scale change discrimination head for fine-grained change localization. This frozen-backbone + global cross-attention paradigm significantly improves feature matching accuracy and generalization under large viewpoint discrepancies. Our method achieves state-of-the-art F1-scores on VL-CMU-CD, PSCD, and their viewpoint-augmented benchmarks. It demonstrates strong robustness to photometric distortions and geometric deformations, and exhibits superior cross-environment fine-tuning performance compared to existing approaches.

Technology Category

Application Category

📝 Abstract
We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2, and integrates full-image cross-attention to address key challenges such as varying lighting, seasonal variations, and viewpoint differences. In order to effectively learn correspondences and mis-correspondences between an image pair for the change detection task, we propose to a) ``freeze'' the backbone in order to retain the generality of dense foundation features, and b) employ ``full-image'' cross-attention to better tackle the viewpoint variations between the image pair. We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions. Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs. The results indicate our method's superior generalization capabilities over existing state-of-the-art approaches, showing robustness against photometric and geometric variations as well as better overall generalization when fine-tuned to adapt to new environments. Detailed ablation studies further validate the contributions of each component in our architecture. Our source code is available at: https://github.com/ChadLin9596/Robust-Scene-Change-Detection.
Problem

Research questions and friction points this paper is trying to address.

Robust scene change detection
Overcome lighting and viewpoint variations
Enhance generalization with cross-attention mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes DINOv2 for feature extraction
Implements full-image cross-attention mechanism
Freezes backbone to retain feature generality
🔎 Similar Papers
No similar papers found.