OSDFace: One-Step Diffusion Model for Face Restoration

๐Ÿ“… 2024-11-26
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing diffusion models for face inpainting suffer from excessive inference steps, high computational cost, identity distortion, and insufficient photorealism. To address these issues, we propose the first single-step diffusion-based face inpainting framework. Our method introduces a Visual Representation Embedder (VRE) that integrates vector-quantized prompts to enhance semantic controllability; incorporates an identity-consistency loss guided by face recognition to preserve subject identity; and employs a GAN-guided distribution alignment mechanism to improve texture fidelity and naturalness. Quantitatively, our approach achieves state-of-the-art performance across LPIPS, FID, and ID Similarity metrics, while also attaining superior subjective quality. Crucially, it enables millisecond-level single-step generation without compromising reconstruction fidelityโ€”marking a significant step toward practical, real-time diffusion-based face inpainting.

Technology Category

Application Category

๐Ÿ“ Abstract
Diffusion models have demonstrated impressive performance in face restoration. Yet, their multi-step inference process remains computationally intensive, limiting their applicability in real-world scenarios. Moreover, existing methods often struggle to generate face images that are harmonious, realistic, and consistent with the subject's identity. In this work, we propose OSDFace, a novel one-step diffusion model for face restoration. Specifically, we propose a visual representation embedder (VRE) to better capture prior information and understand the input face. In VRE, low-quality faces are processed by a visual tokenizer and subsequently embedded with a vector-quantized dictionary to generate visual prompts. Additionally, we incorporate a facial identity loss derived from face recognition to further ensure identity consistency. We further employ a generative adversarial network (GAN) as a guidance model to encourage distribution alignment between the restored face and the ground truth. Experimental results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics, generating high-fidelity, natural face images with high identity consistency. The code and model will be released at https://github.com/jkwang28/OSDFace.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational intensity of diffusion models for face restoration
Improving harmony and realism in restored face images
Ensuring identity consistency in face restoration outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step diffusion model for fast restoration
Visual representation embedder captures prior information
GAN guidance ensures realistic identity-consistent results
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jingkai Wang
Shanghai Jiao Tong University
Jue Gong
Jue Gong
Shanghai Jiao Tong University
Computer VisionImage Restoration
L
Lin Zhang
Shanghai Jiao Tong University
Z
Zheng Chen
Shanghai Jiao Tong University
Xingang Liu
Xingang Liu
University of Electronic Science and Techology of China
information
Hong Gu
Hong Gu
National Institute on Drug Abuse, NIH
functional MRIfunctional connectivitydrug addiction
Y
Yutong Liu
Shanghai Jiao Tong University
Y
Yulun Zhang
Shanghai Jiao Tong University
X
Xiaokang Yang
Shanghai Jiao Tong University