MemOVCD: Training-Free Open-Vocabulary Change Detection via Cross-Temporal Memory Reasoning and Global-Local Adaptive Rectification

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This work addresses the challenges of semantic discontinuity and fragmented change regions in open-vocabulary remote sensing change detection, which arise from insufficient temporal semantic coupling and overly localized reasoning. To overcome these limitations, the authors reformulate bitemporal change detection as a two-frame tracking task and introduce a training-free cross-temporal memory reasoning mechanism. This approach aggregates semantic evidence through weighted bidirectional propagation, mitigates abrupt appearance shifts via histogram-aligned intermediate frames, and integrates multiscale predictions using a global–local adaptive refinement strategy. Leveraging foundation models such as SAM, DINO, and CLIP, the proposed method achieves significant improvements in detection accuracy and generalization across five benchmark datasets in open-vocabulary settings.
📝 Abstract
Open-vocabulary change detection aims to identify semantic changes in bi-temporal remote sensing images without predefined categories. Recent methods combine foundation models such as SAM, DINO and CLIP, but typically process each timestamp independently or interact only at the final comparison stage. Such paradigms suffer from insufficient temporal coupling during semantic reasoning, which limits their ability to distinguish genuine semantic changes from non-semantic appearance discrepancies. In addition, patch-dominant inference on high-resolution images often weakens global semantic continuity and produces fragmented change regions. To address these issues, we propose MemOVCD, a training-free open-vocabulary change detection framework based on cross-temporal memory reasoning and global-local adaptive rectification. Specifically, we reformulate bi-temporal change detection as a two-frame tracking problem and introduce weighted bidirectional propagation to aggregate semantic evidence from both temporal directions. To stabilize memory propagation across large temporal gaps, we construct histogram-aligned transition frames to smooth abrupt appearance changes. Moreover, a global-local adaptive rectification strategy adaptively fuses local and global-view predictions, improving spatial consistency while preserving fine-grained details. Experiments on five benchmarks demonstrate that MemOVCD achieves favorable performance on two change detection tasks, validating its effectiveness and generalization under diverse open-vocabulary settings.
Problem

Research questions and friction points this paper is trying to address.

open-vocabulary change detection
temporal coupling
semantic reasoning
global-local consistency
remote sensing
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-temporal memory reasoning
global-local adaptive rectification
training-free
open-vocabulary change detection
histogram-aligned transition frames
🔎 Similar Papers
No similar papers found.
Z
Zuzheng Kuang
School of Information and Communications Engineering, Xi’an Jiaotong University
H
Honghao Chang
School of Information and Communications Engineering, Xi’an Jiaotong University
B
Boqiang Liang
School of Information and Communications Engineering, Xi’an Jiaotong University
H
Haoqian Wang
School of Information and Communications Engineering, Xi’an Jiaotong University
Lijun He
Lijun He
General Electric Global Research Center
F
Fan Li
School of Information and Communications Engineering, Xi’an Jiaotong University
H
Haixia Bi
School of Information and Communications Engineering, Xi’an Jiaotong University