Towards Consistent and Controllable Image Synthesis for Face Editing

📅 2025-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion models struggle with disentangled control and cross-attribute consistency in fine-grained facial attribute editing (e.g., pose, expression, illumination). This paper introduces RigFace, the first framework to deeply integrate a coarse-grained 3D face model with Stable Diffusion. We design a spatial attribute encoder and an identity encoder operating in tandem to inject 3D-aware conditioning into the UNet denoising process; feature modulation enables multi-attribute disentanglement while strongly preserving identity. RigFace achieves state-of-the-art performance in identity fidelity and perceptual realism, supporting high-precision, independently controllable editing of pose, expression, and illumination. Quantitative and qualitative evaluations demonstrate significant improvements in editing consistency and controllability over prior methods, without compromising visual quality or identity integrity.

Technology Category

Application Category

📝 Abstract
Current face editing methods mainly rely on GAN-based techniques, but recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in manipulating fine-grained attributes and preserving consistency of attributes that should remain unchanged. To address these issues and facilitate more convenient editing of face images, we propose a novel approach that leverages the power of Stable-Diffusion models and crude 3D face models to control the lighting, facial expression and head pose of a portrait photo. We observe that this task essentially involve combinations of target background, identity and different face attributes. We aim to sufficiently disentangle the control of these factors to enable high-quality of face editing. Specifically, our method, coined as RigFace, contains: 1) A Spatial Arrtibute Encoder that provides presise and decoupled conditions of background, pose, expression and lighting; 2) An Identity Encoder that transfers identity features to the denoising UNet of a pre-trained Stable-Diffusion model; 3) An Attribute Rigger that injects those conditions into the denoising UNet. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models.
Problem

Research questions and friction points this paper is trying to address.

Improve fine-grained face attribute manipulation
Preserve consistency in unchanged face attributes
Control lighting, expression, and pose in portraits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stable-Diffusion models integration
3D face models for control
Spatial Attribute Encoder usage
🔎 Similar Papers
No similar papers found.
Mengting Wei
Mengting Wei
University of Oulu
Computer VisionFace Generation
Tuomas Varanka
Tuomas Varanka
Doctoral researcher, University of Oulu
Yante Li
Yante Li
University of Oulu
Computer VisionAffective ComputingDeep learning
X
Xingxun Jiang
Key Laboratory of Child Development and Learning Science of Ministry of Education, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, China; Center for Machine Vision and Signal Analysis, Faulty of Information Technology and Electrical Engineering, University of Oulu, Oulu, FI-90014, Finland
Huai-Qian Khor
Huai-Qian Khor
Oulun Yliopisto
CVML
Guoying Zhao
Guoying Zhao
Academy Professor, IEEE Fellow, Professor of Computer Science and Engineering, University of Oulu
Affective ComputingArtificial IntelligenceComputer VisionPattern Recognition