FaceLift: Single Image to 3D Head with View Generation and GS-LRM

📅 2024-12-23

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

198K/year

🤖 AI Summary

Existing monocular 3D face reconstruction methods suffer from limited multi-view supervision, hindering simultaneous full-coverage reconstruction and cross-view consistency. To address this, we propose an end-to-end framework: first, a synthetic-data-driven multi-view latent diffusion model generates identity-consistent profile and rear views; these, together with the input frontal image, are fed into GS-LRM—a Gaussian Splatting-based latent reconstruction model—to produce high-fidelity 360° 3D avatars. Our key innovation lies in the joint optimization of multi-view diffusion generation and GS-LRM reconstruction—trained exclusively on synthetic data yet achieving strong generalization to real-world images. The method supports single-image 3D reconstruction, video-driven 4D novel-view synthesis, and 2D landmark-driven 3D facial animation. Quantitative and qualitative evaluations demonstrate significant improvements over state-of-the-art methods, particularly in identity fidelity, cross-view consistency, and generalization capability.

Technology Category

Application Category

📝 Abstract

We present FaceLift, a feed-forward approach for rapid, high-quality, 360-degree head reconstruction from a single image. Our pipeline begins by employing a multi-view latent diffusion model that generates consistent side and back views of the head from a single facial input. These generated views then serve as input to a GS-LRM reconstructor, which produces a comprehensive 3D representation using Gaussian splats. To train our system, we develop a dataset of multi-view renderings using synthetic 3D human head as-sets. The diffusion-based multi-view generator is trained exclusively on synthetic head images, while the GS-LRM reconstructor undergoes initial training on Objaverse followed by fine-tuning on synthetic head data. FaceLift excels at preserving identity and maintaining view consistency across views. Despite being trained solely on synthetic data, FaceLift demonstrates remarkable generalization to real-world images. Through extensive qualitative and quantitative evaluations, we show that FaceLift outperforms state-of-the-art methods in 3D head reconstruction, highlighting its practical applicability and robust performance on real-world images. In addition to single image reconstruction, FaceLift supports video inputs for 4D novel view synthesis and seamlessly integrates with 2D reanimation techniques to enable 3D facial animation. Project page: https://weijielyu.github.io/FaceLift.

Problem

Research questions and friction points this paper is trying to address.

Achieving 360-degree 3D head reconstruction from single image

Ensuring view consistency in monocular 3D face reconstruction

Bridging domain gap between synthetic and real-world images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view latent diffusion model generates consistent views

Transformer-based reconstructor produces 3D Gaussian splats

Synthetic dataset bridges domain gap effectively

🔎 Similar Papers

Single Image, Any Face: Generalisable 3D Face Generation