🤖 AI Summary
Blind face restoration suffers from information asymmetry due to sparse inputs and dense outputs, often leading to stochasticity and hallucinatory artifacts. To address this, this work proposes Pref-Restore, a novel framework that introduces in-policy reinforcement learning into the diffusion-based restoration loop for the first time. Human preferences are encoded as differentiable constraints, and an autoregressive integrator maps textual instructions into dense latent queries, effectively unifying discrete semantic logic with continuous texture generation. This enables preference-aligned, deterministic restoration. The method achieves state-of-the-art performance on both synthetic and real-world benchmarks, significantly reducing the entropy of the solution space and enhancing the determinism and fidelity of restored results.
📝 Abstract
Blind face restoration remains a persistent challenge due to the inherent ill-posedness of reconstructing holistic structures from severely constrained observations. Current generative approaches, while capable of synthesizing realistic textures, often suffer from information asymmetry -- the intrinsic disparity between the information-sparse low quality inputs and the information-dense high quality outputs. This imbalance leads to a one-to-many mapping, where insufficient constraints result in stochastic uncertainty and hallucinatory artifacts. To bridge this gap, we present \textbf{Pref-Restore}, a hierarchical framework that integrates discrete semantic logic with continuous texture generation to achieve deterministic, preference-aligned restoration. Our methodology fundamentally addresses this information disparity through two complementary strategies: (1) Augmenting Input Density: We employ an auto-regressive integrator to reformulate textual instructions into dense latent queries, injecting high-level semantic stability to constrain the degraded signals; (2) Pruning Output Distribution: We pioneer the integration of on-policy reinforcement learning directly into the diffusion restoration loop. By transforming human preferences into differentiable constraints, we explicitly penalize stochastic deviations, thereby sharpening the posterior distribution toward the desired high-fidelity outcomes. Extensive experiments demonstrate that Pref-Restore achieves state-of-the-art performance across synthetic and real-world benchmarks. Furthermore, empirical analysis confirms that our preference-aligned strategy significantly reduces solution entropy, establishing a robust pathway toward reliable and deterministic blind restoration.