VGGT-CD: Training-Free Robust Registration for 3D Change Detection

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses key challenges in multi-view image analysis—namely, the misinterpretation of viewpoint changes as physical changes, depth ambiguity, scale drift in cross-temporal 3D reconstruction, registration interference due to scene changes, and edge noise—by proposing a novel, learning-free two-stage registration method. In the coarse stage, a unified metric space is established through joint inference over sparse keyframes to estimate an initial Sim(3) transformation. The fine stage isolates static-background correspondences to purify dense reconstruction and employs a closed-form centroid alignment to optimize translation while fixing scale and rotation. This approach achieves, for the first time, decoupled change detection and cross-temporal 3D registration without learning, incorporating a residual self-check mechanism that mathematically guarantees non-degenerate alignment. On the World Across Time dataset, it reduces outdoor and indoor absolute trajectory errors by 44% and 59%, respectively, accelerates registration by over sixfold, and produces high-fidelity 3D change maps.
📝 Abstract
3D change detection from multi-view images is essential for urban monitoring, disaster assessment, and autonomous driving. However, existing methods predominantly operate in the 2D domain, where viewpoint variations are mistaken for physical changes and depth is unavailable. While visual geometry foundation models like VGGT rapidly produce dense point clouds from unposed images, independent per-epoch reconstruction encounters fundamental obstacles: unpredictable inter-epoch scale ambiguity, registration-change paradox where scene changes corrupt alignment, and pervasive edge-flying noise. To address these challenges, we present VGGT-CD, a training-free pipeline decoupling cross-temporal registration from dynamic-change interference. In the Coarse Stage, sparse keyframe joint inference establishes a unified metric space and yields an initial Sim(3) prior. In the Fine Stage, dense reconstructions are purified by isolating static-background correspondences. A closed-form centroid alignment refines the translation while locking scale and rotation, using a residual self-check to mathematically guarantee non-degradation. Evaluated on an 11-scene benchmark from the World Across Time dataset, VGGT-CD reduces Absolute Trajectory Error by 44% outdoors and 59% indoors. It completes registration over 6 times faster, producing high-purity 3D change maps without task-specific training.
Problem

Research questions and friction points this paper is trying to address.

3D change detection
cross-temporal registration
scale ambiguity
registration-change paradox
edge-flying noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free
3D change detection
cross-temporal registration
static-background correspondence
Sim(3) alignment