EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

To address the challenge of high-fidelity, memory-efficient reconstruction for long-term dynamic 3D scenes (e.g., complex human performances), this paper proposes a two-stage evolutionary 3D Gaussian representation. First, deformation-aware alignment ensures inter-frame geometric consistency for robust initialization; second, optical-flow-guided local optimization dynamically refines the Gaussian set via sparse point insertion and deletion. For the first time, our purely explicit representation unifies efficient temporal modeling and ultra-high compression (>50×), eliminating reliance on implicit neural networks or video encoders. Evaluated on multiple dynamic human datasets, our method significantly outperforms state-of-the-art approaches: +1.8 dB PSNR, −12% LPIPS, real-time rendering at >60 FPS, and over 50× reduction in storage overhead.

Technology Category

Application Category

📝 Abstract

We have recently seen great progress in 3D scene reconstruction through explicit point-based 3D Gaussian Splatting (3DGS), notable for its high quality and fast rendering speed. However, reconstructing dynamic scenes such as complex human performances with long durations remains challenging. Prior efforts fall short of modeling a long-term sequence with drastic motions, frequent topology changes or interactions with props, and resort to segmenting the whole sequence into groups of frames that are processed independently, which undermines temporal stability and thereby leads to an unpleasant viewing experience and inefficient storage footprint. In view of this, we introduce EvolvingGS, a two-stage strategy that first deforms the Gaussian model to coarsely align with the target frame, and then refines it with minimal point addition/subtraction, particularly in fast-changing areas. Owing to the flexibility of the incrementally evolving representation, our method outperforms existing approaches in terms of both per-frame and temporal quality metrics while maintaining fast rendering through its purely explicit representation. Moreover, by exploiting temporal coherence between successive frames, we propose a simple yet effective compression algorithm that achieves over 50x compression rate. Extensive experiments on both public benchmarks and challenging custom datasets demonstrate that our method significantly advances the state-of-the-art in dynamic scene reconstruction, particularly for extended sequences with complex human performances.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing dynamic scenes with complex human performances

Maintaining temporal stability in long-duration sequences

Achieving high compression rates for efficient storage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage strategy for dynamic scene reconstruction

Incremental evolving 3D Gaussian representation

Effective compression algorithm exploiting temporal coherence

🔎 Similar Papers

SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length