Bringing Diversity from Diffusion Models to Semantic-Guided Face Asset Generation

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Digital 3D facial modeling suffers from high acquisition costs, heavy manual intervention, and actor constraints, resulting in limited data diversity and poor controllability. To address this, we propose a semantic-controllable, end-to-end 3D face generation framework. Methodologically, we introduce the first diffusion-model-driven pipeline for 3D face data augmentation; design a normalization module to bridge the domain gap between synthetic and real-world scanned data; and develop a conditional GAN that jointly generates geometry and albedo, enabling continuous latent-space semantic editing and physics-informed refinement. Our contributions include: (i) a high-quality 3D face database comprising 44,000 identities; (ii) state-of-the-art generation performance—significantly surpassing baselines in FID, Chamfer distance, and user studies; and (iii) an open-source interactive web tool supporting real-time semantic control and exportable asset generation.

Technology Category

Application Category

📝 Abstract
Digital modeling and reconstruction of human faces serve various applications. However, its availability is often hindered by the requirements of data capturing devices, manual labor, and suitable actors. This situation restricts the diversity, expressiveness, and control over the resulting models. This work aims to demonstrate that a semantically controllable generative network can provide enhanced control over the digital face modeling process. To enhance diversity beyond the limited human faces scanned in a controlled setting, we introduce a novel data generation pipeline that creates a high-quality 3D face database using a pre-trained diffusion model. Our proposed normalization module converts synthesized data from the diffusion model into high-quality scanned data. Using the 44,000 face models we obtained, we further developed an efficient GAN-based generator. This generator accepts semantic attributes as input, and generates geometry and albedo. It also allows continuous post-editing of attributes in the latent space. Our asset refinement component subsequently creates physically-based facial assets. We introduce a comprehensive system designed for creating and editing high-quality face assets. Our proposed model has undergone extensive experiment, comparison and evaluation. We also integrate everything into a web-based interactive tool. We aim to make this tool publicly available with the release of the paper.
Problem

Research questions and friction points this paper is trying to address.

Enhancing diversity in 3D face modeling using diffusion models
Generating high-quality 3D face assets with semantic control
Overcoming limitations of traditional face scanning methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models for diverse 3D face generation
Normalizes synthetic data to match scanned quality
Semantic-guided GAN for editable face assets
🔎 Similar Papers
Y
Yunxuan Cai
University of Southern California, USC Institute for Creative Technologies, USA
Sitao Xiang
Sitao Xiang
University of Southern California
Computer Graphics
Z
Zongjian Li
University of Southern California, USC Institute for Creative Technologies, USA
Haiwei Chen
Haiwei Chen
University of Southern California
Computer Vision
Yajie Zhao
Yajie Zhao
Computer Scientist at University of Southern California
Virtual HumanNeural RenderAR/VR