๐ค AI Summary
To address the challenges of distorted detail generation and poor identity preservation in blind face super-resolution, this paper proposes a Reward Feedback Learning (ReFL) frameworkโthe first to incorporate a differentiable gradient-guidance mechanism into diffusion model optimization. The core contribution is a dynamically trained Face Reward Model (FRM) that jointly encodes perceptual quality, diversity regularization, and structural consistency constraints, effectively mitigating reward hacking. ReFL enables end-to-end co-training of the FRM and the diffusion model, facilitating fine-grained gradient flow modulation. Extensive experiments on both synthetic and real-world datasets demonstrate significant improvements over state-of-the-art methods: identity consistency increases by 12.6%, while detail sharpness and visual naturalness are markedly enhanced. These results validate the effectiveness and generalizability of reward-driven diffusion optimization for blind face restoration.
๐ Abstract
Reward Feedback Learning (ReFL) has recently shown great potential in aligning model outputs with human preferences across various generative tasks. In this work, we introduce a ReFL framework, named DiffusionReward, to the Blind Face Restoration task for the first time. DiffusionReward effectively overcomes the limitations of diffusion-based methods, which often fail to generate realistic facial details and exhibit poor identity consistency. The core of our framework is the Face Reward Model (FRM), which is trained using carefully annotated data. It provides feedback signals that play a pivotal role in steering the optimization process of the restoration network. In particular, our ReFL framework incorporates a gradient flow into the denoising process of off-the-shelf face restoration methods to guide the update of model parameters. The guiding gradient is collaboratively determined by three aspects: (i) the FRM to ensure the perceptual quality of the restored faces; (ii) a regularization term that functions as a safeguard to preserve generative diversity; and (iii) a structural consistency constraint to maintain facial fidelity. Furthermore, the FRM undergoes dynamic optimization throughout the process. It not only ensures that the restoration network stays precisely aligned with the real face manifold, but also effectively prevents reward hacking. Experiments on synthetic and wild datasets demonstrate that our method outperforms state-of-the-art methods, significantly improving identity consistency and facial details. The source codes, data, and models are available at: https://github.com/01NeuralNinja/DiffusionReward.