Parametric Gaussian Human Model: Generalizable Prior for Efficient and Realistic Human Avatar Modeling

📅 2025-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two key bottlenecks in monocular video-driven 3D Gaussian Splatting (3DGS) for human avatars: time-consuming per-subject optimization and poor generalization under sparse input views. We propose the first general-purpose 3DGS framework for high-fidelity, animatable human reconstruction. Our method introduces: (1) a UV-aligned implicit identity map for spatially consistent identity encoding; (2) a decoupled multi-head U-Net architecture that explicitly models and jointly optimizes geometry, appearance, pose, and viewpoint attributes; and (3) integration of parametric human priors with UV-space feature encoding to enhance structural robustness under sparse-view conditions. The framework reconstructs a single subject in approximately 20 minutes—significantly faster than full optimization—while achieving comparable visual fidelity. It maintains stable rendering quality even under large pose variations and extreme camera viewpoints.

Technology Category

Application Category

📝 Abstract
Photorealistic and animatable human avatars are a key enabler for virtual/augmented reality, telepresence, and digital entertainment. While recent advances in 3D Gaussian Splatting (3DGS) have greatly improved rendering quality and efficiency, existing methods still face fundamental challenges, including time-consuming per-subject optimization and poor generalization under sparse monocular inputs. In this work, we present the Parametric Gaussian Human Model (PGHM), a generalizable and efficient framework that integrates human priors into 3DGS for fast and high-fidelity avatar reconstruction from monocular videos. PGHM introduces two core components: (1) a UV-aligned latent identity map that compactly encodes subject-specific geometry and appearance into a learnable feature tensor; and (2) a disentangled Multi-Head U-Net that predicts Gaussian attributes by decomposing static, pose-dependent, and view-dependent components via conditioned decoders. This design enables robust rendering quality under challenging poses and viewpoints, while allowing efficient subject adaptation without requiring multi-view capture or long optimization time. Experiments show that PGHM is significantly more efficient than optimization-from-scratch methods, requiring only approximately 20 minutes per subject to produce avatars with comparable visual quality, thereby demonstrating its practical applicability for real-world monocular avatar creation.
Problem

Research questions and friction points this paper is trying to address.

Efficient high-fidelity avatar reconstruction from monocular videos
Overcoming time-consuming per-subject optimization in 3DGS
Improving generalization under sparse monocular inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

UV-aligned latent identity map for compact encoding
Disentangled Multi-Head U-Net for Gaussian attributes
Efficient subject adaptation in 20 minutes
🔎 Similar Papers
No similar papers found.
C
Cheng Peng
Tsinghua University, China
Jingxiang Sun
Jingxiang Sun
Tsinghua University | Prev DeepSeek, Nvidia
3D Vision
Y
Yushuo Chen
Tsinghua University, China
Z
Zhaoqi Su
Tsinghua University, China
Z
Zhuo Su
ByteDance, China
Yebin Liu
Yebin Liu
Professor, Tsinghua University
Computer GraphicsComputational Photography3D VisionDigital Humans