HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D Avatars

๐Ÿ“… 2025-10-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing 3D Gaussian Splatting (3DGS) representations lack human priors, leading to suboptimal compression efficiency and poor reconstruction quality for digital humans. Method: We propose a hierarchical Gaussian compression framework that, for the first time, integrates the SMPL-X human model into 3DGSโ€”decoupling a structural layer (static geometry and semantics) from a motion layer (pose-driven deformation) to enable layered compression and progressive decoding. We further incorporate a StyleUNet-based generator with facial attention to preserve fine-grained facial details and semantic consistency under low bitrates, and support multi-pose controllable rendering driven by video or text. Contribution/Results: End-to-end joint optimization of inter-layer representations achieves superior trade-offs between compression ratio and visual fidelity. Our method outperforms state-of-the-art approaches on multiple benchmarks, enabling real-time streaming and dynamic rendering of high-fidelity digital humans.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advances in 3D Gaussian Splatting (3DGS) have enabled fast, photorealistic rendering of dynamic 3D scenes, showing strong potential in immersive communication. However, in digital human encoding and transmission, the compression methods based on general 3DGS representations are limited by the lack of human priors, resulting in suboptimal bitrate efficiency and reconstruction quality at the decoder side, which hinders their application in streamable 3D avatar systems. We propose HGC-Avatar, a novel Hierarchical Gaussian Compression framework designed for efficient transmission and high-quality rendering of dynamic avatars. Our method disentangles the Gaussian representation into a structural layer, which maps poses to Gaussians via a StyleUNet-based generator, and a motion layer, which leverages the SMPL-X model to represent temporal pose variations compactly and semantically. This hierarchical design supports layer-wise compression, progressive decoding, and controllable rendering from diverse pose inputs such as video sequences or text. Since people are most concerned with facial realism, we incorporate a facial attention mechanism during StyleUNet training to preserve identity and expression details under low-bitrate constraints. Experimental results demonstrate that HGC-Avatar provides a streamable solution for rapid 3D avatar rendering, while significantly outperforming prior methods in both visual quality and compression efficiency.
Problem

Research questions and friction points this paper is trying to address.

Compressing dynamic 3D avatars for efficient transmission
Improving reconstruction quality under low-bitrate constraints
Enabling streamable rendering from diverse pose inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Gaussian Compression for dynamic avatars
Disentangles Gaussian representation into structural and motion layers
Incorporates facial attention mechanism for identity preservation
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Haocheng Tang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University. Beijing, China
Ruoke Yan
Ruoke Yan
Peking University
X
Xinhui Yin
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University. Beijing, China
Q
Qi Zhang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University. Beijing, China
Xinfeng Zhang
Xinfeng Zhang
Fuxi AI Lab, NetEase Inc.
Vision-Language ModelsMultimodal
S
Siwei Ma
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University. Beijing, China
W
Wen Gao
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University. Beijing, China
Chuanmin Jia
Chuanmin Jia
Peking University
Video CodingMultimediaData Compression