🤖 AI Summary
Addressing the challenge of maintaining temporal consistency in 3D scene representation under sparse-view conditions—common in urban planning, disaster assessment, and heritage preservation where dense scanning is infeasible—this paper proposes a spatiotemporal sparse reconstruction and updating framework. Methodologically, it innovatively integrates cross-temporal camera alignment, perturbation-based confidence initialization, and progressive cross-temporal optimization, while leveraging 3D Gaussian splatting to jointly exploit sparse image observations and historical priors for bidirectional scene updating and historical state recovery. Experiments demonstrate that our approach significantly outperforms existing baselines in both reconstruction fidelity and data efficiency. It enables high-fidelity, low-overhead spatiotemporal 3D versioning and digital twin construction, supporting robust long-term scene evolution modeling from limited observational data.
📝 Abstract
Maintaining consistent 3D scene representations over time is a significant challenge in computer vision. Updating 3D scenes from sparse-view observations is crucial for various real-world applications, including urban planning, disaster assessment, and historical site preservation, where dense scans are often unavailable or impractical. In this paper, we propose Cross-Temporal 3D Gaussian Splatting (Cross-Temporal 3DGS), a novel framework for efficiently reconstructing and updating 3D scenes across different time periods, using sparse images and previously captured scene priors. Our approach comprises three stages: 1) Cross-temporal camera alignment for estimating and aligning camera poses across different timestamps; 2) Interference-based confidence initialization to identify unchanged regions between timestamps, thereby guiding updates; and 3) Progressive cross-temporal optimization, which iteratively integrates historical prior information into the 3D scene to enhance reconstruction quality. Our method supports non-continuous capture, enabling not only updates using new sparse views to refine existing scenes, but also recovering past scenes from limited data with the help of current captures. Furthermore, we demonstrate the potential of this approach to achieve temporal changes using only sparse images, which can later be reconstructed into detailed 3D representations as needed. Experimental results show significant improvements over baseline methods in reconstruction quality and data efficiency, making this approach a promising solution for scene versioning, cross-temporal digital twins, and long-term spatial documentation.