🤖 AI Summary
This work addresses the challenge of achieving fine-grained semantic control in 3D facial expression editing. The authors propose a high-fidelity, view-consistent 3D face synthesis method that enables precise expression manipulation by jointly optimizing the texture latent codes of a pre-trained 3D-aware GAN and the expression parameters of a 3D Morphable Model (3DMM). Key innovations include a Dual Mappers module that disentangles texture and geometry representations, a CLIP-based text-guided optimization mechanism, and a Subspace Projection technique that facilitates fine-grained semantic control within the expression subspace. Experimental results demonstrate that the proposed approach outperforms existing methods in terms of generation quality, view consistency, and expression control accuracy.
📝 Abstract
Facial expression editing methods can be mainly categorized into two types based on their architectures: 2D-based and 3D-based methods. The former lacks 3D face modeling capabilities, making it difficult to edit 3D factors effectively. The latter has demonstrated superior performance in generating high-quality and view-consistent renderings using single-view 2D face images. Although these methods have successfully used animatable models to control facial expressions, they still have limitations in achieving precise control over fine-grained expressions. To address this issue, in this paper, we propose a novel approach by simultaneously refining both the latent code of a pretrained 3D-Aware GAN model for texture editing and the expression code of the driven 3DMM model for mesh editing. Specifically, we introduce a Dual Mappers module, comprising Texture Mapper and Emotion Mapper, to learn the transformations of the given latent code for textures and the expression code for meshes, respectively. To optimize the Dual Mappers, we propose a Text-Guided Optimization method, leveraging a CLIP-based objective function with expression text prompts as targets, while integrating a SubSpace Projection mechanism to project the text embedding to the expression subspace such that we can have more precise control over fine-grained expressions. Extensive experiments and comparative analyses demonstrate the effectiveness and superiority of our proposed method.