FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key bottlenecks in single-image-driven 4D facial avatar generation—including geometric distortion, identity/expression inconsistency, and heavy reliance on multi-view data—this paper proposes the first unified framework jointly leveraging shape, image, and video priors. Methodologically: (1) We integrate 3D-GAN inversion with diffusion-based depth-guided texture mapping to enhance geometric fidelity and cross-view texture consistency; (2) we design a video-synchronized driving signal modeling module to improve temporal expression naturalness; (3) we introduce a consistency-inconsistency joint training strategy to explicitly disentangle identity from dynamic attributes. Our approach achieves full-view, high-fidelity 4D reconstruction from a single input image, significantly outperforming state-of-the-art methods. Quantitative and qualitative evaluations demonstrate superior performance across geometry accuracy, cross-view consistency, and animation quality metrics.

Technology Category

Application Category

📝 Abstract
We present a novel framework for generating high-quality, animatable 4D avatar from a single image. While recent advances have shown promising results in 4D avatar creation, existing methods either require extensive multiview data or struggle with shape accuracy and identity consistency. To address these limitations, we propose a comprehensive system that leverages shape, image, and video priors to create full-view, animatable avatars. Our approach first obtains initial coarse shape through 3D-GAN inversion. Then, it enhances multiview textures using depth-guided warping signals for cross-view consistency with the help of the image diffusion model. To handle expression animation, we incorporate a video prior with synchronized driving signals across viewpoints. We further introduce a Consistent-Inconsistent training to effectively handle data inconsistencies during 4D reconstruction. Experimental results demonstrate that our method achieves superior quality compared to the prior art, while maintaining consistency across different viewpoints and expressions.
Problem

Research questions and friction points this paper is trying to address.

Generating animatable 4D avatars from single images
Overcoming multiview data dependency and shape inaccuracy
Ensuring identity consistency across viewpoints and expressions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages shape, image, video priors
Uses 3D-GAN inversion for initial shape
Depth-guided warping for texture consistency
🔎 Similar Papers
No similar papers found.
F
Fei Yin
University of Cambridge
R
R MallikarjunB
Stability AI
Chun-Han Yao
Chun-Han Yao
Stability AI
Computer Vision
R
Rafal Mantiuk
University of Cambridge
Varun Jampani
Varun Jampani
Vice President of Research, Stability AI
Computer VisionMachine Learning