Reconstruction and Reenactment Separated Method for Realistic Gaussian Head

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of generating controllable 3D Gaussian head avatars from a single portrait image with real-time rendering capability. We propose a two-stage framework that decouples reconstruction from reenactment. First, leveraging WebSSL, we construct a large-scale one-click Gaussian head generator trained in two stages to achieve high generalization in geometry reconstruction and faithful preservation of high-frequency texture details. Second, we design an ultra-lightweight driving module that explicitly disentangles geometric and appearance modeling from dynamic control, ensuring that parameter expansion during reconstruction does not compromise driving efficiency. The architecture adheres to scaling laws, enabling fine-grained control over pose, expression, and illumination. At 512×512 resolution, our method achieves 90 FPS real-time rendering. Quantitative and qualitative evaluations demonstrate consistent superiority over state-of-the-art approaches, establishing an efficient and scalable paradigm for high-fidelity 3D head avatar generation from a single input image.

Technology Category

Application Category

📝 Abstract
In this paper, we explore a reconstruction and reenactment separated framework for 3D Gaussians head, which requires only a single portrait image as input to generate controllable avatar. Specifically, we developed a large-scale one-shot gaussian head generator built upon WebSSL and employed a two-stage training approach that significantly enhances the capabilities of generalization and high-frequency texture reconstruction. During inference, an ultra-lightweight gaussian avatar driven by control signals enables high frame-rate rendering, achieving 90 FPS at a resolution of 512x512. We further demonstrate that the proposed framework follows the scaling law, whereby increasing the parameter scale of the reconstruction module leads to improved performance. Moreover, thanks to the separation design, driving efficiency remains unaffected. Finally, extensive quantitative and qualitative experiments validate that our approach outperforms current state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Single-image 3D Gaussian head avatar generation
High-frequency texture reconstruction enhancement
Real-time controllable avatar driving at 90FPS
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot Gaussian head generator using WebSSL
Two-stage training enhances generalization and texture
Ultra-lightweight avatar enables 90 FPS rendering
🔎 Similar Papers
No similar papers found.
Z
Zhiling Ye
Mashang Consumer Finance Co., Ltd.
Cong Zhou
Cong Zhou
Anuttacon
speech synthesisspeech understandingaudio codingmultimodality LLM
X
Xiubao Zhang
Mashang Consumer Finance Co., Ltd.
H
Haifeng Shen
Mashang Consumer Finance Co., Ltd.
Weihong Deng
Weihong Deng
Professor, Beijing University of Posts and Telecommunications
Multimodal LearningTrustworthy AIAffective computingBiometrics
Q
Quan Lu
Mashang Consumer Finance Co., Ltd.