🤖 AI Summary
This work addresses the challenge of reconstructing realistic and animatable 4D head avatars from a single portrait image, a task often hindered by inconsistencies in 3D geometry. The authors propose a geometry-aware diffusion framework that, for the first time, incorporates geometric priors directly into the diffusion model to jointly generate both the portrait image and its corresponding surface normals. An identity-agnostic expression encoder extracts implicit facial expression representations, which are then integrated into a 3D Gaussian avatar to enable high-fidelity rendering. The method significantly outperforms existing approaches in terms of visual quality, expression fidelity, and cross-identity generalization, while also supporting real-time rendering.
📝 Abstract
Reconstructing photorealistic and animatable 4D head avatars from a single portrait image remains a fundamental challenge in computer vision. While diffusion models have enabled remarkable progress in image and video generation for avatar reconstruction, existing methods primarily rely on 2D priors and struggle to achieve consistent 3D geometry. We propose a novel framework that leverages geometry-aware diffusion to learn strong geometry priors for high-fidelity head avatar reconstruction. Our approach jointly synthesizes portrait images and corresponding surface normals, while a pose-free expression encoder captures implicit expression representations. Both synthesized images and expression latents are incorporated into 3D Gaussian-based avatars, enabling photorealistic rendering with accurate geometry. Extensive experiments demonstrate that our method substantially outperforms state-of-the-art approaches in visual quality, expression fidelity, and cross-identity generalization, while supporting real-time rendering.