FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models

📅 2023-12-07

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the problem of reconstructing relightable 3D head avatars from a single unconstrained in-the-wild facial image. We propose the first identity-aware multimodal 3D latent diffusion model, guided by face recognition embeddings. Our method jointly generates geometry, albedo (diffuse and specular reflectance maps), and normal maps in an end-to-end manner. Optimization leverages identity-preserving reverse-diffusion guidance, differentiable 3D rendering supervision, and joint perceptual and identity losses—enabling strong generalization with only limited annotated 3D data. Unlike prior approaches, our model is the first to achieve end-to-end co-generation of shape and full reflectance properties, significantly improving reconstruction accuracy and visual realism. The output is compatible with mainstream rendering engines and supports high-fidelity relighting under arbitrary illumination conditions. Extensive evaluation demonstrates state-of-the-art performance on standard benchmarks.

📝 Abstract

The remarkable progress in 3D face reconstruction has resulted in high-detail and photorealistic facial representations. Recently, Diffusion Models have revolutionized the capabilities of generative methods by surpassing the performance of GANs. In this work, we present FitDiff, a diffusion-based 3D facial avatar generative model. Leveraging diffusion principles, our model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an"in-the-wild"2D facial image. The introduced multi-modal diffusion model is the first to concurrently output facial reflectance maps (diffuse and specular albedo and normals) and shapes, showcasing great generalization capabilities. It is solely trained on an annotated subset of a public facial dataset, paired with 3D reconstructions. We revisit the typical 3D facial fitting approach by guiding a reverse diffusion process using perceptual and face recognition losses. Being the first 3D LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in common rendering engines, starting only from an unconstrained facial image, and achieving state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Estimates 3D facial shape and reflectance from single 2D images.

Generates relightable facial avatars using diffusion models.

Achieves state-of-the-art performance with unconstrained facial images.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based 3D facial avatar generation

Multi-modal diffusion for reflectance and shape

Face recognition-guided reverse diffusion process

🔎 Similar Papers

Single Image, Any Face: Generalisable 3D Face Generation