GUAVA: Generalizable Upper Body 3D Gaussian Avatar

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of reconstructing high-fidelity, animatable 3D upper-body avatars from a single image—a task hindered by existing methods’ reliance on multi-view inputs or videos, requirement of subject-specific training, and limited expressiveness of SMPLX parametric representations. We propose the first generalizable single-image-driven framework for upper-body Gaussian avatar reconstruction. Our method integrates inverse texture mapping, projection-based sampling, 3D Gaussian splatting, and a neural refinement network, augmented by Expression-Hierarchical Modeling (EHM) to explicitly enhance facial and hand articulation, thereby overcoming geometric and expressive limitations of SMPLX. The framework achieves reconstruction in just 0.1 seconds, enabling real-time rendering and fine-grained animation. Quantitative and qualitative evaluations demonstrate superior performance over state-of-the-art approaches in fidelity, generalization across unseen subjects, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Reconstructing a high-quality, animatable 3D human avatar with expressive facial and hand motions from a single image has gained significant attention due to its broad application potential. 3D human avatar reconstruction typically requires multi-view or monocular videos and training on individual IDs, which is both complex and time-consuming. Furthermore, limited by SMPLX's expressiveness, these methods often focus on body motion but struggle with facial expressions. To address these challenges, we first introduce an expressive human model (EHM) to enhance facial expression capabilities and develop an accurate tracking method. Based on this template model, we propose GUAVA, the first framework for fast animatable upper-body 3D Gaussian avatar reconstruction. We leverage inverse texture mapping and projection sampling techniques to infer Ubody (upper-body) Gaussians from a single image. The rendered images are refined through a neural refiner. Experimental results demonstrate that GUAVA significantly outperforms previous methods in rendering quality and offers significant speed improvements, with reconstruction times in the sub-second range (0.1s), and supports real-time animation and rendering.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D human avatar from single image
Enhancing facial expression capabilities in avatars
Achieving fast, real-time animation and rendering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expressive human model enhances facial expressions
Inverse texture mapping for Gaussian inference
Neural refiner improves rendered image quality
🔎 Similar Papers
No similar papers found.
Dongbin Zhang
Dongbin Zhang
Tsinghua University
Y
Yunfei Liu
International Digital Economy Academy (IDEA)
Lijian Lin
Lijian Lin
Tencent ARC Lab
Computer VisionVisual Tracking,Video Object Detection
Y
Ye Zhu
International Digital Economy Academy (IDEA)
Y
Yang Li
Tsinghua Shenzhen International Graduate School, Tsinghua University
Minghan Qin
Minghan Qin
Bytedance Research | Tsinghua University
Computer Vision3D Vision3D Scene Perception
Y
Yu Li
International Digital Economy Academy (IDEA)
H
Haoqian Wang
Tsinghua Shenzhen International Graduate School, Tsinghua University