🤖 AI Summary
This work addresses the issue of identity distortion in face restoration caused by occlusions or missing regions by proposing a ControlNet-based latent diffusion model. The method leverages identity embeddings extracted from a pretrained face recognition network as conditioning signals to guide the diffusion process in reconstructing occluded areas. During training, it incorporates an explicit identity consistency constraint along with triplet loss to enhance identity preservation. Notably, the approach achieves strong robustness in maintaining facial identity without requiring fine-tuning or additional supervision. Extensive experiments demonstrate that the proposed method significantly outperforms existing diffusion-based inpainting techniques, achieving state-of-the-art identity fidelity on benchmark datasets including CelebA-HQ, FFHQ, and the newly introduced E-Mask dataset.
📝 Abstract
Face inpainting techniques recover missing or occluded facial regions in a visually realistic manner, but preserving the identity in the final output remains a fundamental challenge. Identity consistency is crucial for downstream applications such as face recognition, digital forensics, and human-computer interaction, where even subtle identity distortions can significantly degrade performance or trust. Although diffusion-based generative models have recently achieved remarkable progress in image inpainting, they often struggle to faithfully retain individual-specific facial characteristics. On the other hand, existing identity-aware methods typically rely on costly fine-tuning, auxiliary supervision, or exhibit limited robustness to diverse occlusions, poses, and facial variations. To address these limitations, we propose ID-ControlNet, an identity-preserving face inpainting framework built upon latent diffusion models. Based on ControlNet architecture, our approach conditions the diffusion process on facial identity embeddings extracted from a pretrained face recognition network. This design enables reconstruction of occluded facial regions while maintaining global facial coherence and identity fidelity. Furthermore, we introduce an identity consistency and triplet loss training strategy that explicitly enforces alignment between the generated face and the target identity representation. Extensive experiments on CelebA-HQ, FFHQ, and on a new E-Mask dataset demonstrate that ID-ControlNet significantly improves identity preservation over standard diffusion-based inpainting methods, achieving performance comparable to SOTA identity-aware approaches.