FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address key bottlenecks in single-image-driven 4D facial avatar generation—including geometric distortion, identity/expression inconsistency, and heavy reliance on multi-view data—this paper proposes the first unified framework jointly leveraging shape, image, and video priors. Methodologically: (1) We integrate 3D-GAN inversion with diffusion-based depth-guided texture mapping to enhance geometric fidelity and cross-view texture consistency; (2) we design a video-synchronized driving signal modeling module to improve temporal expression naturalness; (3) we introduce a consistency-inconsistency joint training strategy to explicitly disentangle identity from dynamic attributes. Our approach achieves full-view, high-fidelity 4D reconstruction from a single input image, significantly outperforming state-of-the-art methods. Quantitative and qualitative evaluations demonstrate superior performance across geometry accuracy, cross-view consistency, and animation quality metrics.

Technology Category

Application Category

📝 Abstract

We present a novel framework for generating high-quality, animatable 4D avatar from a single image. While recent advances have shown promising results in 4D avatar creation, existing methods either require extensive multiview data or struggle with shape accuracy and identity consistency. To address these limitations, we propose a comprehensive system that leverages shape, image, and video priors to create full-view, animatable avatars. Our approach first obtains initial coarse shape through 3D-GAN inversion. Then, it enhances multiview textures using depth-guided warping signals for cross-view consistency with the help of the image diffusion model. To handle expression animation, we incorporate a video prior with synchronized driving signals across viewpoints. We further introduce a Consistent-Inconsistent training to effectively handle data inconsistencies during 4D reconstruction. Experimental results demonstrate that our method achieves superior quality compared to the prior art, while maintaining consistency across different viewpoints and expressions.

Problem

Research questions and friction points this paper is trying to address.

Generating animatable 4D avatars from single images

Overcoming multiview data dependency and shape inaccuracy

Ensuring identity consistency across viewpoints and expressions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages shape, image, video priors

Uses 3D-GAN inversion for initial shape

Depth-guided warping for texture consistency

🔎 Similar Papers

Single Image, Any Face: Generalisable 3D Face Generation