FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution

📅 2025-12-27

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Traditional image quality assessment (IQA) models for image super-resolution (ISR) yield only global scalar scores, limiting their ability to capture localized distortions and leading to reward hacking and perceptual–fidelity misalignment. To address this, we propose the Fine-grained Perception Reward Model (FGRM), the first IQA framework jointly outputting a spatially localized perception degradation map and a global quality score. We further introduce Curriculum Co-evolution Learning (CCL), a novel paradigm that synergistically trains the reward model and generator to enhance training stability and mitigate reward hacking. Built upon an encoder-decoder architecture, FGRM is validated on our newly constructed FGR-30k dataset—comprising real-world distortions—and within an RLHF framework. Results show that, with negligible changes in PSNR/SSIM, LPIPS improves by 12.6% and user preference win rate increases by 28.4%, significantly enhancing both local realism and global perceptual consistency.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning with Human Feedback (RLHF) has proven effective in image generation field guided by reward models to align human preferences. Motivated by this, adapting RLHF for Image Super-Resolution (ISR) tasks has shown promise in optimizing perceptual quality with Image Quality Assessment (IQA) model as reward models. However, the traditional IQA model usually output a single global score, which are exceptionally insensitive to local and fine-grained distortions. This insensitivity allows ISR models to produce perceptually undesirable artifacts that yield spurious high scores, misaligning optimization objectives with perceptual quality and results in reward hacking. To address this, we propose a Fine-grained Perceptual Reward Model (FinPercep-RM) based on an Encoder-Decoder architecture. While providing a global quality score, it also generates a Perceptual Degradation Map that spatially localizes and quantifies local defects. We specifically introduce the FGR-30k dataset to train this model, consisting of diverse and subtle distortions from real-world super-resolution models. Despite the success of the FinPercep-RM model, its complexity introduces significant challenges in generator policy learning, leading to training instability. To address this, we propose a Co-evolutionary Curriculum Learning (CCL) mechanism, where both the reward model and the ISR model undergo synchronized curricula. The reward model progressively increases in complexity, while the ISR model starts with a simpler global reward for rapid convergence, gradually transitioning to the more complex model outputs. This easy-to-hard strategy enables stable training while suppressing reward hacking. Experiments validates the effectiveness of our method across ISR models in both global quality and local realism on RLHF methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses insensitivity of traditional IQA models to local distortions in super-resolution.

Mitigates reward hacking from spurious high scores misaligning with perceptual quality.

Resolves training instability from complex reward models in RL-based super-resolution.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained Perceptual Reward Model with degradation map

Co-evolutionary Curriculum Learning for stable training

FGR-30k dataset for training on real-world distortions

🔎 Similar Papers

IG-CFAT: An Improved GAN-Based Framework for Effectively Exploiting Transformers in Real-World Image Super-Resolution