Parametric Gaussian Human Model: Generalizable Prior for Efficient and Realistic Human Avatar Modeling

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses two key bottlenecks in monocular video-driven 3D Gaussian Splatting (3DGS) for human avatars: time-consuming per-subject optimization and poor generalization under sparse input views. We propose the first general-purpose 3DGS framework for high-fidelity, animatable human reconstruction. Our method introduces: (1) a UV-aligned implicit identity map for spatially consistent identity encoding; (2) a decoupled multi-head U-Net architecture that explicitly models and jointly optimizes geometry, appearance, pose, and viewpoint attributes; and (3) integration of parametric human priors with UV-space feature encoding to enhance structural robustness under sparse-view conditions. The framework reconstructs a single subject in approximately 20 minutes—significantly faster than full optimization—while achieving comparable visual fidelity. It maintains stable rendering quality even under large pose variations and extreme camera viewpoints.

Technology Category

Application Category

📝 Abstract

Photorealistic and animatable human avatars are a key enabler for virtual/augmented reality, telepresence, and digital entertainment. While recent advances in 3D Gaussian Splatting (3DGS) have greatly improved rendering quality and efficiency, existing methods still face fundamental challenges, including time-consuming per-subject optimization and poor generalization under sparse monocular inputs. In this work, we present the Parametric Gaussian Human Model (PGHM), a generalizable and efficient framework that integrates human priors into 3DGS for fast and high-fidelity avatar reconstruction from monocular videos. PGHM introduces two core components: (1) a UV-aligned latent identity map that compactly encodes subject-specific geometry and appearance into a learnable feature tensor; and (2) a disentangled Multi-Head U-Net that predicts Gaussian attributes by decomposing static, pose-dependent, and view-dependent components via conditioned decoders. This design enables robust rendering quality under challenging poses and viewpoints, while allowing efficient subject adaptation without requiring multi-view capture or long optimization time. Experiments show that PGHM is significantly more efficient than optimization-from-scratch methods, requiring only approximately 20 minutes per subject to produce avatars with comparable visual quality, thereby demonstrating its practical applicability for real-world monocular avatar creation.

Problem

Research questions and friction points this paper is trying to address.

Efficient high-fidelity avatar reconstruction from monocular videos

Overcoming time-consuming per-subject optimization in 3DGS

Improving generalization under sparse monocular inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

UV-aligned latent identity map for compact encoding

Disentangled Multi-Head U-Net for Gaussian attributes

Efficient subject adaptation in 20 minutes

🔎 Similar Papers

No similar papers found.

Authors to Follow