SceneDiff: A Benchmark and Method for Multiview Object Change Detection

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This paper addresses precise change detection in multi-view image/video sequences—specifically identifying object appearance, disappearance, and relocation across time—while mitigating spurious changes induced by viewpoint discrepancies. We propose a novel “3D alignment + semantic-spatial joint comparison” paradigm: a training-free framework that integrates off-the-shelf pre-trained models for 3D reconstruction, instance segmentation, and visual encoding to achieve cross-view 3D geometric alignment, instance-level region extraction, and fine-grained feature comparison. To support systematic evaluation, we introduce SceneDiff—the first multi-view change detection benchmark with instance-level annotations. Our method achieves +94% AP on the multi-view benchmark and +37.4% AP on the dual-view benchmark, substantially outperforming prior approaches. This work establishes a new pathway toward unsupervised, high-accuracy cross-view change understanding.

Technology Category

Application Category

📝 Abstract

We investigate the problem of identifying objects that have been added, removed, or moved between a pair of captures (images or videos) of the same scene at different times. Detecting such changes is important for many applications, such as robotic tidying or construction progress and safety monitoring. A major challenge is that varying viewpoints can cause objects to falsely appear changed. We introduce SceneDiff Benchmark, the first multiview change detection benchmark with object instance annotations, comprising 350 diverse video pairs with thousands of changed objects. We also introduce the SceneDiff method, a new training-free approach for multiview object change detection that leverages pretrained 3D, segmentation, and image encoding models to robustly predict across multiple benchmarks. Our method aligns the captures in 3D, extracts object regions, and compares spatial and semantic region features to detect changes. Experiments on multi-view and two-view benchmarks demonstrate that our method outperforms existing approaches by large margins (94% and 37.4% relative AP improvements). The benchmark and code will be publicly released.

Problem

Research questions and friction points this paper is trying to address.

Detects added, removed, or moved objects between scene captures

Addresses false changes from varying viewpoints in multiview settings

Provides a benchmark and method for object-level change detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pretrained 3D, segmentation, image encoding models

Aligns captures in 3D and extracts object regions

Compares spatial and semantic region features for detection

🔎 Similar Papers

Zero-Shot Scene Change Detection