High-Fidelity 3D Facial Avatar Synthesis with Controllable Fine-Grained Expressions

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of achieving fine-grained semantic control in 3D facial expression editing. The authors propose a high-fidelity, view-consistent 3D face synthesis method that enables precise expression manipulation by jointly optimizing the texture latent codes of a pre-trained 3D-aware GAN and the expression parameters of a 3D Morphable Model (3DMM). Key innovations include a Dual Mappers module that disentangles texture and geometry representations, a CLIP-based text-guided optimization mechanism, and a Subspace Projection technique that facilitates fine-grained semantic control within the expression subspace. Experimental results demonstrate that the proposed approach outperforms existing methods in terms of generation quality, view consistency, and expression control accuracy.

Technology Category

Application Category

📝 Abstract
Facial expression editing methods can be mainly categorized into two types based on their architectures: 2D-based and 3D-based methods. The former lacks 3D face modeling capabilities, making it difficult to edit 3D factors effectively. The latter has demonstrated superior performance in generating high-quality and view-consistent renderings using single-view 2D face images. Although these methods have successfully used animatable models to control facial expressions, they still have limitations in achieving precise control over fine-grained expressions. To address this issue, in this paper, we propose a novel approach by simultaneously refining both the latent code of a pretrained 3D-Aware GAN model for texture editing and the expression code of the driven 3DMM model for mesh editing. Specifically, we introduce a Dual Mappers module, comprising Texture Mapper and Emotion Mapper, to learn the transformations of the given latent code for textures and the expression code for meshes, respectively. To optimize the Dual Mappers, we propose a Text-Guided Optimization method, leveraging a CLIP-based objective function with expression text prompts as targets, while integrating a SubSpace Projection mechanism to project the text embedding to the expression subspace such that we can have more precise control over fine-grained expressions. Extensive experiments and comparative analyses demonstrate the effectiveness and superiority of our proposed method.
Problem

Research questions and friction points this paper is trying to address.

3D facial avatar
fine-grained expressions
expression control
3D face modeling
facial animation
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D facial avatar
fine-grained expression control
Dual Mappers
text-guided optimization
3DMM
🔎 Similar Papers
No similar papers found.
Y
Yikang He
Institute of Information Science, Beijing Jiaotong University, Beijing, China
Jichao Zhang
Jichao Zhang
Ocean University of China | University of Trento
Generative ModelComputer GraphicsComputer VisionNeural Rendering
W
Wei Wang
Institute of Information Science, Beijing Jiaotong University, Beijing, China
Nicu Sebe
Nicu Sebe
University of Trento
computer visionmultimedia
Y
Yao Zhao
Institute of Information Science, Beijing Jiaotong University, Beijing, China