DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning

๐Ÿ“… 2025-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of distorted detail generation and poor identity preservation in blind face super-resolution, this paper proposes a Reward Feedback Learning (ReFL) frameworkโ€”the first to incorporate a differentiable gradient-guidance mechanism into diffusion model optimization. The core contribution is a dynamically trained Face Reward Model (FRM) that jointly encodes perceptual quality, diversity regularization, and structural consistency constraints, effectively mitigating reward hacking. ReFL enables end-to-end co-training of the FRM and the diffusion model, facilitating fine-grained gradient flow modulation. Extensive experiments on both synthetic and real-world datasets demonstrate significant improvements over state-of-the-art methods: identity consistency increases by 12.6%, while detail sharpness and visual naturalness are markedly enhanced. These results validate the effectiveness and generalizability of reward-driven diffusion optimization for blind face restoration.

Technology Category

Application Category

๐Ÿ“ Abstract
Reward Feedback Learning (ReFL) has recently shown great potential in aligning model outputs with human preferences across various generative tasks. In this work, we introduce a ReFL framework, named DiffusionReward, to the Blind Face Restoration task for the first time. DiffusionReward effectively overcomes the limitations of diffusion-based methods, which often fail to generate realistic facial details and exhibit poor identity consistency. The core of our framework is the Face Reward Model (FRM), which is trained using carefully annotated data. It provides feedback signals that play a pivotal role in steering the optimization process of the restoration network. In particular, our ReFL framework incorporates a gradient flow into the denoising process of off-the-shelf face restoration methods to guide the update of model parameters. The guiding gradient is collaboratively determined by three aspects: (i) the FRM to ensure the perceptual quality of the restored faces; (ii) a regularization term that functions as a safeguard to preserve generative diversity; and (iii) a structural consistency constraint to maintain facial fidelity. Furthermore, the FRM undergoes dynamic optimization throughout the process. It not only ensures that the restoration network stays precisely aligned with the real face manifold, but also effectively prevents reward hacking. Experiments on synthetic and wild datasets demonstrate that our method outperforms state-of-the-art methods, significantly improving identity consistency and facial details. The source codes, data, and models are available at: https://github.com/01NeuralNinja/DiffusionReward.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of diffusion-based face restoration methods
Improving identity consistency and facial detail realism
Aligning model outputs with human preferences via feedback learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Reward Feedback Learning for face restoration
Incorporates Face Reward Model for quality feedback
Dynamic optimization prevents reward hacking
๐Ÿ”Ž Similar Papers
No similar papers found.
B
Bin Wu
Institute of Information Science, Beijing Jiaotong University; Visual Intelligence +X International Cooperation Joint Laboratory of MOE
W
Wei Wang
Institute of Information Science, Beijing Jiaotong University; Visual Intelligence +X International Cooperation Joint Laboratory of MOE
Y
Yahui Liu
Kuaishou Technology
Zixiang Li
Zixiang Li
Beijing Jiaotong University
Y
Yao Zhao
Institute of Information Science, Beijing Jiaotong University; Visual Intelligence +X International Cooperation Joint Laboratory of MOE