OSDFace: One-Step Diffusion Model for Face Restoration

๐Ÿ“… 2024-11-26
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF

career value

206K/year
๐Ÿค– AI Summary
Existing diffusion models for face inpainting suffer from excessive inference steps, high computational cost, identity distortion, and insufficient photorealism. To address these issues, we propose the first single-step diffusion-based face inpainting framework. Our method introduces a Visual Representation Embedder (VRE) that integrates vector-quantized prompts to enhance semantic controllability; incorporates an identity-consistency loss guided by face recognition to preserve subject identity; and employs a GAN-guided distribution alignment mechanism to improve texture fidelity and naturalness. Quantitatively, our approach achieves state-of-the-art performance across LPIPS, FID, and ID Similarity metrics, while also attaining superior subjective quality. Crucially, it enables millisecond-level single-step generation without compromising reconstruction fidelityโ€”marking a significant step toward practical, real-time diffusion-based face inpainting.

Technology Category

Application Category

๐Ÿ“ Abstract
Diffusion models have demonstrated impressive performance in face restoration. Yet, their multi-step inference process remains computationally intensive, limiting their applicability in real-world scenarios. Moreover, existing methods often struggle to generate face images that are harmonious, realistic, and consistent with the subject's identity. In this work, we propose OSDFace, a novel one-step diffusion model for face restoration. Specifically, we propose a visual representation embedder (VRE) to better capture prior information and understand the input face. In VRE, low-quality faces are processed by a visual tokenizer and subsequently embedded with a vector-quantized dictionary to generate visual prompts. Additionally, we incorporate a facial identity loss derived from face recognition to further ensure identity consistency. We further employ a generative adversarial network (GAN) as a guidance model to encourage distribution alignment between the restored face and the ground truth. Experimental results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics, generating high-fidelity, natural face images with high identity consistency. The code and model will be released at https://github.com/jkwang28/OSDFace.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational intensity of diffusion models for face restoration
Improving harmony and realism in restored face images
Ensuring identity consistency in face restoration outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step diffusion model for fast restoration
Visual representation embedder captures prior information
GAN guidance ensures realistic identity-consistent results