SEGA: Drivable 3D Gaussian Head Avatar from a Single Image

📅 2025-04-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for reconstructing high-fidelity, animatable 3D head avatars from a single image suffer from severe reliance on multi-view inputs, limiting practical applicability. Method: We propose the first single-image reconstruction framework based on hierarchical UV-space Gaussian splatting. Our approach jointly leverages 2D foundation model priors and FLAME geometric priors to disentangle static identity from dynamic expression components; introduces a dual-branch dynamic modeling scheme with FLAME-based deformation constraints; and supports personalized fine-tuning. Results: Given only a single frontal image, our method achieves state-of-the-art identity consistency, cross-view geometric coherence, and expression realism. It enables real-time rendering and physics-aware animation driving, significantly enhancing both practicality and generalization capability of single-image digital human generation.

Technology Category

Application Category

📝 Abstract
Creating photorealistic 3D head avatars from limited input has become increasingly important for applications in virtual reality, telepresence, and digital entertainment. While recent advances like neural rendering and 3D Gaussian splatting have enabled high-quality digital human avatar creation and animation, most methods rely on multiple images or multi-view inputs, limiting their practicality for real-world use. In this paper, we propose SEGA, a novel approach for Single-imagE-based 3D drivable Gaussian head Avatar creation that combines generalized prior models with a new hierarchical UV-space Gaussian Splatting framework. SEGA seamlessly combines priors derived from large-scale 2D datasets with 3D priors learned from multi-view, multi-expression, and multi-ID data, achieving robust generalization to unseen identities while ensuring 3D consistency across novel viewpoints and expressions. We further present a hierarchical UV-space Gaussian Splatting framework that leverages FLAME-based structural priors and employs a dual-branch architecture to disentangle dynamic and static facial components effectively. The dynamic branch encodes expression-driven fine details, while the static branch focuses on expression-invariant regions, enabling efficient parameter inference and precomputation. This design maximizes the utility of limited 3D data and achieves real-time performance for animation and rendering. Additionally, SEGA performs person-specific fine-tuning to further enhance the fidelity and realism of the generated avatars. Experiments show our method outperforms state-of-the-art approaches in generalization ability, identity preservation, and expression realism, advancing one-shot avatar creation for practical applications.
Problem

Research questions and friction points this paper is trying to address.

Creating 3D head avatars from single images
Combining 2D and 3D priors for generalization
Achieving real-time animation and rendering performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-image 3D Gaussian head avatar creation
Hierarchical UV-space Gaussian Splatting framework
Dual-branch architecture for dynamic-static disentanglement
🔎 Similar Papers
No similar papers found.
Chen Guo
Chen Guo
ETH Zurich
Computer VisionDigital HumansVirtual Humans
Z
Zhuo Su
Tsinghua Shenzhen International Graduate School
J
Jian Wang
ByteDance
S
Shuang Li
ByteDance
X
Xu Chang
ByteDance
Z
Zhaohu Li
ByteDance
Y
Yang Zhao
ByteDance
G
Guidong Wang
ByteDance
Ruqi Huang
Ruqi Huang
Tsinghua Shenzhen International Graduate School
3D Computer VisionShape AnalysisGeometry Processing