π€ AI Summary
This work addresses the challenges of inaccurate category recognition and ambiguous change localization in open-vocabulary change detection by proposing a training-free, two-stage framework. It first leverages vision foundation modelsβsuch as SAM and DINOv2βto generate category-agnostic change candidate regions, followed by semantic classification using a vision-language model like CLIP. The core innovations include an Open-Vocabulary Diffusion-guided Prototype Retrieval (OpenDPR) mechanism to enhance fine-grained land cover recognition and a Spatial-to-Change (S2C) weakly supervised module to mitigate the absence of explicit change priors. Evaluated on four remote sensing benchmarks, the method significantly outperforms existing approaches under both fully supervised and weakly supervised settings.
π Abstract
Open-vocabulary change detection (OVCD) seeks to recognize arbitrary changes of interest by enabling generalization beyond a fixed set of predefined classes. We reformulate OVCD as a two-stage pipeline: first generate class-agnostic change proposals using visual foundation models (VFMs) such as SAM and DINOv2, and then perform category identification with vision-language models (VLMs) such as CLIP. We reveal that category identification errors are the primary bottleneck of OVCD, mainly due to the limited ability of VLMs based on image-text matching to represent fine-grained land-cover categories. To address this, we propose OpenDPR, a training-free vision-centric diffusion-guided prototype retrieval framework. OpenDPR leverages diffusion models to construct diverse prototypes for target categories offline, and to perform similarity retrieval with change proposals in the visual space during inference. The secondary bottleneck lies in change localization, due to the inherent lack of change priors in VFMs. To bridge this gap, we design a spatial-to-change weakly supervised change detection module named S2C to adapt their strong spatial modeling capabilities for change localization. Integrating the pretrained S2C into OpenDPR leads to an optional weakly supervised variant named OpenDPR-W, which further improves OVCD with minimal supervision. Experimental results on four benchmark datasets demonstrate that the proposed methods achieve state-of-the-art performance under both supervision modes. Code is available at https://github.com/guoqi2002/OpenDPR.