🤖 AI Summary
High-fidelity reconstruction and real-time rendering of large-scale dynamic urban scenes remain challenging, as existing methods—relying on separate static/dynamic modeling—fail to capture their coupled spatiotemporal interactions.
Method: We propose Periodic Vibrating Gaussians (PVG), a unified 3D Gaussian splatting representation that explicitly embeds temporal dynamics: (i) a novel periodic implicit time encoding for city-scale spatiotemporal dynamics; (ii) temporal smoothing regularization and position-aware adaptive density/opacity control to enhance temporal consistency under sparse observations and generalization to large scenes.
Results: PVG achieves significant improvements over state-of-the-art methods on Waymo and KITTI benchmarks—without requiring bounding-box annotations or optical-flow supervision—while attaining rendering speeds 900× faster than the best baseline.
📝 Abstract
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent and large scene representation learning with sparse training data, we introduce a novel temporal smoothing mechanism and a position-aware adaptive control strategy respectively. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 900-fold acceleration in rendering over the best alternative.