🤖 AI Summary
To address the dual challenges of efficient video compression and real-time rendering, this paper proposes GSVC—a video representation method based on learnable 2D Gaussian ellipsoids. Methodologically, GSVC introduces 2D Gaussian splatting into video modeling for the first time, replacing conventional pixels or blocks with sparse, differentiable elliptical primitives. It further designs an inter-frame Gaussian parameter prediction mechanism, integrating contribution-aware adaptive pruning and motion-guided dynamic Gaussian insertion/deletion. Additionally, keyframes are implicitly learned via loss-difference-based detection of scene discontinuities. Experimental results demonstrate that GSVC achieves rate-distortion performance competitive with AV1 and VVC, while enabling real-time rendering at 1500 fps for 1920×1080 video—simultaneously delivering high compression efficiency, high visual fidelity, and low inference latency.
📝 Abstract
3D Gaussian splats have emerged as a revolutionary, effective, learned representation for static 3D scenes. In this work, we explore using 2D Gaussian splats as a new primitive for representing videos. We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames. GSVC incorporates the following techniques: (i) To exploit temporal redundancy among adjacent frames, which can speed up training and improve the compression efficiency, we predict the Gaussian splats of a frame based on its previous frame; (ii) To control the trade-offs between file size and quality, we remove Gaussian splats with low contribution to the video quality; (iii) To capture dynamics in videos, we randomly add Gaussian splats to fit content with large motion or newly-appeared objects; (iv) To handle significant changes in the scene, we detect key frames based on loss differences during the learning process. Experiment results show that GSVC achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs such as AV1 and VVC, and a rendering speed of 1500 fps for a 1920x1080 video.