π€ AI Summary
This work addresses the challenge of efficiently transmitting voxel-based representations for free-viewpoint video, which typically suffer from high bandwidth demands. To this end, the authors propose an end-to-end compression framework that transforms TriPlane radiance field features into codec-friendly canvases through quantization and packing strategies compatible with standard image and video codecs (e.g., JPEG, VP9, HEVC, AV1). During training, a straight-through estimator (STE) is employed to embed the non-differentiable compression-decompression pipeline into the optimization loop, enabling the model to adapt to real-world compression artifacts without requiring learnable codec parameters. The method consistently outperforms both codec-agnostic and jointly learned approaches across static and dynamic scenes, achieving superior compression efficiency and rendering speed compared to existing 3D Gaussian Splatting compression techniques, with bitrates approaching those of conventional 2D video.
π Abstract
Volumetric media promises next-generation content delivery applications, but its bandwidth demand remains a key bottleneck. Implicit and hybrid volumetric representations reduce model sizes, yet still require careful coding to reach 2D video-like bitrates. We present CATRF, a standard-codec-in-the-loop compression framework for plane-factorized radiance fields. During training, we quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip (JPEG/VP9/HEVC/AV1), then unpack and dequantize the decoded features before volume rendering. We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions without introducing any learned codec parameters. On both static and dynamic benchmarks, CATRF consistently achieves a better rate-distortion trade-off over codec-agnostic and learned-codec-in-the-loop baselines, and also outperforms recent compressed 3DGS methods in both compression efficiency and decoding speed. These results highlight a practical path toward low-bitrate, compression-resilient volumetric representations for free-viewpoint video streaming.