Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of efficiently compressing 3D Gaussian human avatars for video at ultra-low bitrates. Methodologically, it introduces a lightweight, disentangled compression framework that separates appearance from motion: a static canonical portrait—modeled via articulated splatting—serves as the appearance base, while dynamic deformation is represented by only 94 time-varying parameters; these parameters drive differentiable deformation from the canonical space to target views using a human prior template and linear blend skinning (LBS). Appearance is shared across frames, and motion parameters are minimally encoded. The key contribution is the first deep integration of explicit human structural priors into 3D Gaussian compression, substantially reducing redundancy. Experiments demonstrate state-of-the-art rate-distortion performance on mainstream multi-view human portrait datasets, outperforming both conventional 2D/3D codecs and existing learnable 3D Gaussian compression methods.

Technology Category

Application Category

📝 Abstract
This paper proposes an efficient 3D avatar coding framework that leverages compact human priors and canonical-to-target transformation to enable high-quality 3D human avatar video compression at ultra-low bit rates. The framework begins by training a canonical Gaussian avatar using articulated splatting in a network-free manner, which serves as the foundation for avatar appearance modeling. Simultaneously, a human-prior template is employed to capture temporal body movements through compact parametric representations. This decomposition of appearance and temporal evolution minimizes redundancy, enabling efficient compression: the canonical avatar is shared across the sequence, requiring compression only once, while the temporal parameters, consisting of just 94 parameters per frame, are transmitted with minimal bit-rate. For each frame, the target human avatar is generated by deforming canonical avatar via Linear Blend Skinning transformation, facilitating temporal coherent video reconstruction and novel view synthesis. Experimental results demonstrate that the proposed method significantly outperforms conventional 2D/3D codecs and existing learnable dynamic 3D Gaussian splatting compression method in terms of rate-distortion performance on mainstream multi-view human video datasets, paving the way for seamless immersive multimedia experiences in meta-verse applications.
Problem

Research questions and friction points this paper is trying to address.

Compressing 3D human avatar videos at ultra-low bit rates
Separating appearance and motion for efficient video compression
Improving rate-distortion performance over conventional 2D/3D codecs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging compact human priors for compression
Using canonical-to-target transformation for avatars
Employing Linear Blend Skinning for temporal coherence
🔎 Similar Papers
No similar papers found.
S
Shanzhi Yin
City University of Hong Kong
B
Bolin Chen
DAMO Academy, Alibaba Group; HuPan Laboratory; Fudan University
Xinju Wu
Xinju Wu
City University of Hong Kong
3D visionPoint cloud compression
R
Ru-Ling Liao
DAMO Academy, Alibaba Group
J
Jie Chen
DAMO Academy, Alibaba Group; HuPan Laboratory
S
Shiqi Wang
City University of Hong Kong
Yan Ye
Yan Ye
Alibaba Inc
video coding