๐ค AI Summary
Existing diffusion models for face inpainting suffer from excessive inference steps, high computational cost, identity distortion, and insufficient photorealism. To address these issues, we propose the first single-step diffusion-based face inpainting framework. Our method introduces a Visual Representation Embedder (VRE) that integrates vector-quantized prompts to enhance semantic controllability; incorporates an identity-consistency loss guided by face recognition to preserve subject identity; and employs a GAN-guided distribution alignment mechanism to improve texture fidelity and naturalness. Quantitatively, our approach achieves state-of-the-art performance across LPIPS, FID, and ID Similarity metrics, while also attaining superior subjective quality. Crucially, it enables millisecond-level single-step generation without compromising reconstruction fidelityโmarking a significant step toward practical, real-time diffusion-based face inpainting.
๐ Abstract
Diffusion models have demonstrated impressive performance in face restoration. Yet, their multi-step inference process remains computationally intensive, limiting their applicability in real-world scenarios. Moreover, existing methods often struggle to generate face images that are harmonious, realistic, and consistent with the subject's identity. In this work, we propose OSDFace, a novel one-step diffusion model for face restoration. Specifically, we propose a visual representation embedder (VRE) to better capture prior information and understand the input face. In VRE, low-quality faces are processed by a visual tokenizer and subsequently embedded with a vector-quantized dictionary to generate visual prompts. Additionally, we incorporate a facial identity loss derived from face recognition to further ensure identity consistency. We further employ a generative adversarial network (GAN) as a guidance model to encourage distribution alignment between the restored face and the ground truth. Experimental results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics, generating high-fidelity, natural face images with high identity consistency. The code and model will be released at https://github.com/jkwang28/OSDFace.