🤖 AI Summary
This work addresses the prohibitive initial latency—ranging from tens to hundreds of seconds—of existing 4D Gaussian Splatting (4DGS) models under mobile streaming, which stems from their reliance on full bitstream downloads and hinders adaptive delivery over dynamic bandwidths. To overcome this, we propose the first progressive compression and on-demand streaming framework tailored for 4DGS. Our approach introduces a novel Hierarchical Deformation Decomposition (HDD) mechanism that explicitly disentangles motion into three layers: static structure, global deformation, and local details, enabling renderable reconstructions from any bitstream prefix while remaining compatible with standard DASH/HLS protocols. Integrated with Gaussian rate-distortion optimization, temporal mask consistency regularization, and a learned capacity-aware scheduling strategy, our method operates without scene-specific hyperparameters. Experiments on the Dycheck iPhone benchmark show over 60% bitstream size reduction at comparable rendering quality and reduce first-frame latency from 73–930 seconds to approximately 1.7 seconds under 2 Mbps links, thereby enabling truly progressive 4DGS streaming for the first time.
📝 Abstract
4D Gaussian Splatting (4DGS) enables high-quality dynamic novel view synthesis, yet current models remain monolithic bitstreams that clients must download in full before any frame can be rendered, causing black-screen waits of tens to hundreds of seconds on mobile bandwidth and leaving 4DGS incompatible with modern adaptive-bitrate delivery. Progressive 3DGS compression alleviates this for static scenes, but it acts only on spatial anchors and cannot partition the temporal deformation networks that dominate dynamic-scene size. We present PD-4DGS, the first framework for progressive compression and on-demand transmission of 4DGS. Hierarchical Deformation Decomposition (HDD) externalises the coarse-to-fine motion hierarchy already latent in 4DGS into three independently transmittable layers -- a static scaffold, a global deformation, and a local refinement -- so that any prefix of the bitstream is already renderable, turning a single training run into a scalable, DASH/HLS-compatible bitstream. A Gaussian-entropy attribute rate-distortion loss together with a temporal mask consistency regulariser shrink the base layer while suppressing low-bitrate flicker; a capacity-weighted rollout schedule, gated online by a learnt activation rate rho, then prevents deformation-network under-training without any per-scene hyperparameter. On the Dycheck iPhone benchmark, PD-4DGS cuts the streamed bitstream by >60% at matched rendering fidelity and reduces first-frame latency from 73--930 s to ~1.7 s on a 2 Mbps link, uniquely enabling true on-demand progressive streaming for 4DGS.