Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model

📅 2024-10-05

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

214K/year

🤖 AI Summary

To address spurious attribute generation and identity distortion in low-quality facial image restoration under real-world conditions, this paper proposes a multimodal controllable reconstruction framework. Methodologically, it introduces a novel dual-control adapter architecture coupled with a two-stage curriculum learning strategy, integrating attribute text prompts, high-quality reference images, and explicit identity constraints; it further incorporates negative quality prompting and fine-grained attribute modulation. We construct Reface-HQ—the first large-scale, high-resolution facial benchmark (21K+ samples)—designed to support cross-modal alignment and identity-aware reconstruction. Extensive experiments demonstrate that our method significantly improves detail recovery and identity fidelity under severe degradation, achieving superior visual quality over current state-of-the-art methods while enabling controllable, precise, and perceptually realistic facial reconstruction.

Technology Category

Application Category

📝 Abstract

We introduce a novel Multi-modal Guided Real-World Face Restoration (MGFR) technique designed to improve the quality of facial image restoration from low-quality inputs. Leveraging a blend of attribute text prompts, high-quality reference images, and identity information, MGFR can mitigate the generation of false facial attributes and identities often associated with generative face restoration methods. By incorporating a dual-control adapter and a two-stage training strategy, our method effectively utilizes multi-modal prior information for targeted restoration tasks. We also present the Reface-HQ dataset, comprising over 21,000 high-resolution facial images across 4800 identities, to address the need for reference face training images. Our approach achieves superior visual quality in restoring facial details under severe degradation and allows for controlled restoration processes, enhancing the accuracy of identity preservation and attribute correction. Including negative quality samples and attribute prompts in the training further refines the model's ability to generate detailed and perceptually accurate images.

Problem

Research questions and friction points this paper is trying to address.

Improve facial image restoration from low-quality inputs

Mitigate false facial attributes and identities generation

Enhance identity preservation and attribute correction accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal guided diffusion model

Dual-control adapter strategy

Two-stage training approach

🔎 Similar Papers

FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models

2023-12-07arXiv.orgCitations: 1

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)