OmniOVCD: Streamlining Open-Vocabulary Change Detection with SAM 3

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of open-vocabulary change detection, particularly its reliance on predefined categories and the feature misalignment and system instability caused by multi-model fusion. To overcome these challenges, we propose OmniOVCD, a novel framework that introduces Segment Anything Model 3 (SAM³) to this task for the first time. Leveraging SAM³’s decoupled output heads, we design a Synergistic Fusion and Instance Decoupling (SFID) strategy that integrates semantic, instance, and existence cues within a single end-to-end model, enabling high-precision category recognition and consistent cross-temporal instance alignment. Extensive experiments demonstrate that OmniOVCD achieves state-of-the-art performance across four benchmarks—LEVIR-CD, WHU-CD, S2Looking, and SECOND—with class-averaged IoU scores of 67.2, 66.5, 24.5, and 27.1, respectively.

Technology Category

Application Category

📝 Abstract
Change Detection (CD) is a fundamental task in remote sensing. It monitors the evolution of land cover over time. Based on this, Open-Vocabulary Change Detection (OVCD) introduces a new requirement. It aims to reduce the reliance on predefined categories. Existing training-free OVCD methods mostly use CLIP to identify categories. These methods also need extra models like DINO to extract features. However, combining different models often causes problems in matching features and makes the system unstable. Recently, the Segment Anything Model 3 (SAM 3) is introduced. It integrates segmentation and identification capabilities within one promptable model, which offers new possibilities for the OVCD task. In this paper, we propose OmniOVCD, a standalone framework designed for OVCD. By leveraging the decoupled output heads of SAM 3, we propose a Synergistic Fusion to Instance Decoupling (SFID) strategy. SFID first fuses the semantic, instance, and presence outputs of SAM 3 to construct land-cover masks, and then decomposes them into individual instance masks for change comparison. This design preserves high accuracy in category recognition and maintains instance-level consistency across images. As a result, the model can generate accurate change masks. Experiments on four public benchmarks (LEVIR-CD, WHU-CD, S2Looking, and SECOND) demonstrate SOTA performance, achieving IoU scores of 67.2, 66.5, 24.5, and 27.1 (class-average), respectively, surpassing all previous methods.
Problem

Research questions and friction points this paper is trying to address.

Open-Vocabulary Change Detection
Change Detection
Remote Sensing
Feature Matching
Model Integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-Vocabulary Change Detection
Segment Anything Model 3
Synergistic Fusion to Instance Decoupling
Instance-level Consistency
Promptable Segmentation
🔎 Similar Papers
No similar papers found.