🤖 AI Summary
To address the poor generalizability in free-viewpoint video (FVV) compression of dynamic 3D Gaussian splatting (DGS), caused by tight coupling between reconstruction and coding, this paper proposes D-FCGS—a feed-forward compression framework. Methodologically, it adopts an I-P frame structure and extracts sparse control points to explicitly model motion; motion compensation is then guided by these points, while a dual-perceptual entropy model jointly leverages hyperpriors and spatiotemporal priors. This fully decouples reconstruction from coding. Contributions include: (1) the first dynamic Gaussian point cloud compression method that requires no per-scene optimization and achieves cross-scene generalization; (2) compression ratios exceeding 40× within 2 seconds; (3) rate-distortion performance on par with optimization-based methods; and (4) superior multi-view visual consistency—significantly advancing the practical transmission and storage of dynamic 3D Gaussian representations.
📝 Abstract
Free-viewpoint video (FVV) enables immersive 3D experiences, but efficient compression of dynamic 3D representations remains a major challenge. Recent advances in 3D Gaussian Splatting (3DGS) and its dynamic extensions have enabled high-fidelity scene modeling. However, existing methods often couple scene reconstruction with optimization-dependent coding, which limits generalizability. This paper presents Feedforward Compression of Dynamic Gaussian Splatting (D-FCGS), a novel feedforward framework for compressing temporally correlated Gaussian point cloud sequences. Our approach introduces a Group-of-Frames (GoF) structure with I-P frame coding, where inter-frame motions are extracted via sparse control points. The resulting motion tensors are compressed in a feedforward manner using a dual prior-aware entropy model that combines hyperprior and spatial-temporal priors for accurate rate estimation. For reconstruction, we perform control-point-guided motion compensation and employ a refinement network to enhance view-consistent fidelity. Trained on multi-view video-derived Gaussian frames, D-FCGS generalizes across scenes without per-scene optimization. Experiments show that it matches the rate-distortion performance of optimization-based methods, achieving over 40 times compression in under 2 seconds while preserving visual quality across viewpoints. This work advances feedforward compression for dynamic 3DGS, paving the way for scalable FVV transmission and storage in immersive applications.