๐ค AI Summary
Existing 3D Gaussian Splatting (3DGS) representations lack human priors, leading to suboptimal compression efficiency and poor reconstruction quality for digital humans. Method: We propose a hierarchical Gaussian compression framework that, for the first time, integrates the SMPL-X human model into 3DGSโdecoupling a structural layer (static geometry and semantics) from a motion layer (pose-driven deformation) to enable layered compression and progressive decoding. We further incorporate a StyleUNet-based generator with facial attention to preserve fine-grained facial details and semantic consistency under low bitrates, and support multi-pose controllable rendering driven by video or text. Contribution/Results: End-to-end joint optimization of inter-layer representations achieves superior trade-offs between compression ratio and visual fidelity. Our method outperforms state-of-the-art approaches on multiple benchmarks, enabling real-time streaming and dynamic rendering of high-fidelity digital humans.
๐ Abstract
Recent advances in 3D Gaussian Splatting (3DGS) have enabled fast, photorealistic rendering of dynamic 3D scenes, showing strong potential in immersive communication. However, in digital human encoding and transmission, the compression methods based on general 3DGS representations are limited by the lack of human priors, resulting in suboptimal bitrate efficiency and reconstruction quality at the decoder side, which hinders their application in streamable 3D avatar systems. We propose HGC-Avatar, a novel Hierarchical Gaussian Compression framework designed for efficient transmission and high-quality rendering of dynamic avatars. Our method disentangles the Gaussian representation into a structural layer, which maps poses to Gaussians via a StyleUNet-based generator, and a motion layer, which leverages the SMPL-X model to represent temporal pose variations compactly and semantically. This hierarchical design supports layer-wise compression, progressive decoding, and controllable rendering from diverse pose inputs such as video sequences or text. Since people are most concerned with facial realism, we incorporate a facial attention mechanism during StyleUNet training to preserve identity and expression details under low-bitrate constraints. Experimental results demonstrate that HGC-Avatar provides a streamable solution for rapid 3D avatar rendering, while significantly outperforming prior methods in both visual quality and compression efficiency.