🤖 AI Summary
To address the challenge of high-fidelity, memory-efficient reconstruction for long-term dynamic 3D scenes (e.g., complex human performances), this paper proposes a two-stage evolutionary 3D Gaussian representation. First, deformation-aware alignment ensures inter-frame geometric consistency for robust initialization; second, optical-flow-guided local optimization dynamically refines the Gaussian set via sparse point insertion and deletion. For the first time, our purely explicit representation unifies efficient temporal modeling and ultra-high compression (>50×), eliminating reliance on implicit neural networks or video encoders. Evaluated on multiple dynamic human datasets, our method significantly outperforms state-of-the-art approaches: +1.8 dB PSNR, −12% LPIPS, real-time rendering at >60 FPS, and over 50× reduction in storage overhead.
📝 Abstract
We have recently seen great progress in 3D scene reconstruction through explicit point-based 3D Gaussian Splatting (3DGS), notable for its high quality and fast rendering speed. However, reconstructing dynamic scenes such as complex human performances with long durations remains challenging. Prior efforts fall short of modeling a long-term sequence with drastic motions, frequent topology changes or interactions with props, and resort to segmenting the whole sequence into groups of frames that are processed independently, which undermines temporal stability and thereby leads to an unpleasant viewing experience and inefficient storage footprint. In view of this, we introduce EvolvingGS, a two-stage strategy that first deforms the Gaussian model to coarsely align with the target frame, and then refines it with minimal point addition/subtraction, particularly in fast-changing areas. Owing to the flexibility of the incrementally evolving representation, our method outperforms existing approaches in terms of both per-frame and temporal quality metrics while maintaining fast rendering through its purely explicit representation. Moreover, by exploiting temporal coherence between successive frames, we propose a simple yet effective compression algorithm that achieves over 50x compression rate. Extensive experiments on both public benchmarks and challenging custom datasets demonstrate that our method significantly advances the state-of-the-art in dynamic scene reconstruction, particularly for extended sequences with complex human performances.