🤖 AI Summary
Traditional image quality assessment (IQA) models for image super-resolution (ISR) yield only global scalar scores, limiting their ability to capture localized distortions and leading to reward hacking and perceptual–fidelity misalignment. To address this, we propose the Fine-grained Perception Reward Model (FGRM), the first IQA framework jointly outputting a spatially localized perception degradation map and a global quality score. We further introduce Curriculum Co-evolution Learning (CCL), a novel paradigm that synergistically trains the reward model and generator to enhance training stability and mitigate reward hacking. Built upon an encoder-decoder architecture, FGRM is validated on our newly constructed FGR-30k dataset—comprising real-world distortions—and within an RLHF framework. Results show that, with negligible changes in PSNR/SSIM, LPIPS improves by 12.6% and user preference win rate increases by 28.4%, significantly enhancing both local realism and global perceptual consistency.
📝 Abstract
Reinforcement Learning with Human Feedback (RLHF) has proven effective in image generation field guided by reward models to align human preferences. Motivated by this, adapting RLHF for Image Super-Resolution (ISR) tasks has shown promise in optimizing perceptual quality with Image Quality Assessment (IQA) model as reward models. However, the traditional IQA model usually output a single global score, which are exceptionally insensitive to local and fine-grained distortions. This insensitivity allows ISR models to produce perceptually undesirable artifacts that yield spurious high scores, misaligning optimization objectives with perceptual quality and results in reward hacking. To address this, we propose a Fine-grained Perceptual Reward Model (FinPercep-RM) based on an Encoder-Decoder architecture. While providing a global quality score, it also generates a Perceptual Degradation Map that spatially localizes and quantifies local defects. We specifically introduce the FGR-30k dataset to train this model, consisting of diverse and subtle distortions from real-world super-resolution models. Despite the success of the FinPercep-RM model, its complexity introduces significant challenges in generator policy learning, leading to training instability. To address this, we propose a Co-evolutionary Curriculum Learning (CCL) mechanism, where both the reward model and the ISR model undergo synchronized curricula. The reward model progressively increases in complexity, while the ISR model starts with a simpler global reward for rapid convergence, gradually transitioning to the more complex model outputs. This easy-to-hard strategy enables stable training while suppressing reward hacking. Experiments validates the effectiveness of our method across ISR models in both global quality and local realism on RLHF methods.