Multimodal Conditional 3D Face Geometry Generation

📅 2024-07-01

🏛️ Computers & Graphics

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenge of generating high-fidelity, controllable 3D facial geometry from diverse input modalities. We propose a diffusion-based method that operates directly in UV parameterization space to synthesize high-resolution (512×512) facial geometry. Our approach unifies six heterogeneous conditioning inputs—sketches, Canny edges, 2D keypoints, FLAME parameters, portrait images, and text prompts—within a single model. To achieve fine-grained, disentangled control over identity and expression, we introduce a novel multi-path IP-Adapter cross-attention mechanism. Additionally, we jointly encode FLAME parameters and text embeddings to enforce geometric plausibility and semantic alignment. Extensive experiments demonstrate state-of-the-art performance in cross-modal consistency, identity/expression fidelity, and interactive flexibility, significantly advancing controllable 3D face generation.

Technology Category

Application Category

📝 Abstract

We present a new method for multimodal conditional 3D face geometry generation that allows user-friendly control over the output identity and expression via a number of different conditioning signals. Within a single model, we demonstrate 3D faces generated from artistic sketches, 2D face landmarks, Canny edges, FLAME face model parameters, portrait photos, or text prompts. Our approach is based on a diffusion process that generates 3D geometry in a 2D parameterized UV domain. Geometry generation passes each conditioning signal through a set of cross-attention layers (IP-Adapter), one set for each user-defined conditioning signal. The result is an easy-to-use 3D face generation tool that produces high resolution geometry with fine-grain user control.

Problem

Research questions and friction points this paper is trying to address.

Generating 3D face geometry from multimodal inputs

Enabling user control over identity and expression

Producing topology-consistent high-quality facial models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal conditional generation with cross-attention layers

Diffusion process in 2D UV domain for 3D geometry

Single model accepts multiple input signals including sketches

🔎 Similar Papers

Single Image, Any Face: Generalisable 3D Face Generation