🤖 AI Summary
To address the significant degradation in rendering quality of dynamic 4D Gaussian Splatting (4DGS) under sparse input views—caused by inconsistent spatiotemporal geometry learning—this paper proposes a real-time, high-fidelity novel-view synthesis framework incorporating spatiotemporal geometric consistency. Methodologically, we design a dynamic consistency verification mechanism and introduce global-local depth regularization to jointly constrain spatiotemporal uncertainty in multi-view stereo and monocular depth estimation. Furthermore, geometric prior distillation and spatiotemporal regularization are employed to enhance the robustness of 4D Gaussian splatting in sparse-view settings for both geometry and appearance modeling. Evaluated on the N3DV and Technicolor datasets, our method achieves PSNR improvements of +2.62 dB over RF-DeRF and +1.58 dB over the original 4DGS, while enabling lightweight edge deployment.
📝 Abstract
Gaussian Splatting has been considered as a novel way for view synthesis of dynamic scenes, which shows great potential in AIoT applications such as digital twins. However, recent dynamic Gaussian Splatting methods significantly degrade when only sparse input views are available, limiting their applicability in practice. The issue arises from the incoherent learning of 4D geometry as input views decrease. This paper presents GC-4DGS, a novel framework that infuses geometric consistency into 4D Gaussian Splatting (4DGS), offering real-time and high-quality dynamic scene rendering from sparse input views. While learning-based Multi-View Stereo (MVS) and monocular depth estimators (MDEs) provide geometry priors, directly integrating these with 4DGS yields suboptimal results due to the ill-posed nature of sparse-input 4D geometric optimization. To address these problems, we introduce a dynamic consistency checking strategy to reduce estimation uncertainties of MVS across spacetime. Furthermore, we propose a global-local depth regularization approach to distill spatiotemporal-consistent geometric information from monocular depths, thereby enhancing the coherent geometry and appearance learning within the 4D volume. Extensive experiments on the popular N3DV and Technicolor datasets validate the effectiveness of GC-4DGS in rendering quality without sacrificing efficiency. Notably, our method outperforms RF-DeRF, the latest dynamic radiance field tailored for sparse-input dynamic view synthesis, and the original 4DGS by 2.62dB and 1.58dB in PSNR, respectively, with seamless deployability on resource-constrained IoT edge devices.