GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting

📅 2025-01-08

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

To address key challenges in video neural representation—namely high memory consumption, slow training, and temporal inconsistency—this paper proposes an efficient explicit-implicit hybrid modeling framework for dynamic scenes. Methodologically, it introduces (1) a novel Neural ODE-based framework for learning smooth, continuous, and differentiable camera trajectories; and (2) a spatiotemporal hierarchical progressive optimization strategy that jointly leverages 3D Gaussian rasterization and hierarchical feature learning to optimally balance reconstruction quality and convergence speed. Experiments demonstrate state-of-the-art rendering quality on both low- and high-motion-complexity videos, with a 2.1× speedup in inference, 58% reduction in GPU memory usage, and significantly improved inter-frame temporal consistency. The proposed framework establishes a new paradigm for efficient neural video representation, enabling practical applications in video compression and real-time interactive systems.

Technology Category

Application Category

📝 Abstract

Efficient neural representations for dynamic video scenes are critical for applications ranging from video compression to interactive simulations. Yet, existing methods often face challenges related to high memory usage, lengthy training times, and temporal consistency. To address these issues, we introduce a novel neural video representation that combines 3D Gaussian splatting with continuous camera motion modeling. By leveraging Neural ODEs, our approach learns smooth camera trajectories while maintaining an explicit 3D scene representation through Gaussians. Additionally, we introduce a spatiotemporal hierarchical learning strategy, progressively refining spatial and temporal features to enhance reconstruction quality and accelerate convergence. This memory-efficient approach achieves high-quality rendering at impressive speeds. Experimental results show that our hierarchical learning, combined with robust camera motion modeling, captures complex dynamic scenes with strong temporal consistency, achieving state-of-the-art performance across diverse video datasets in both high- and low-motion scenarios.

Problem

Research questions and friction points this paper is trying to address.

Video Processing

Efficiency

Intelligent Video Representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Ordinary Differential Equations

Spatial and Temporal Hierarchical Learning

3D Graphics and Camera Motion Integration

🔎 Similar Papers

VideoPrism: A Foundational Visual Encoder for Video Understanding