GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-step generative image super-resolution methods suffer from limited sample diversity, inadequate modeling of local details, and reliance on offline paired training data. To address these limitations, this work proposes GDPO (Group Direct Preference Optimization), a novel framework that introduces group-wise relative preference optimization into single-step diffusion models, enabling online reinforcement learning. The approach integrates a noise-aware single-step diffusion architecture with an unequal-step strategy that decouples noise injection from the diffusion process. Furthermore, it employs an attribute-aware reward function to dynamically evaluate the quality of smooth regions and textural details. Experimental results demonstrate that GDPO significantly enhances global consistency, fine texture recovery, and visual realism of reconstructed images while maintaining efficient inference.

Technology Category

Application Category

📝 Abstract
Recently, reinforcement learning (RL) has been employed for improving generative image super-resolution (ISR) performance. However, the current efforts are focused on multi-step generative ISR, while one-step generative ISR remains underexplored due to its limited stochasticity. In addition, RL methods such as Direct Preference Optimization (DPO) require the generation of positive and negative sample pairs offline, leading to a limited number of samples, while Group Relative Policy Optimization (GRPO) only calculates the likelihood of the entire image, ignoring local details that are crucial for ISR. In this paper, we propose Group Direct Preference Optimization (GDPO), a novel approach to integrate RL into one-step generative ISR model training. First, we introduce a noise-aware one-step diffusion model that can generate diverse ISR outputs. To prevent performance degradation caused by noise injection, we introduce an unequal-timestep strategy to decouple the timestep of noise addition from that of diffusion. We then present the GDPO strategy, which integrates the principle of GRPO into DPO, to calculate the group-relative advantage of each online generated sample for model optimization. Meanwhile, an attribute-aware reward function is designed to dynamically evaluate the score of each sample based on its statistics of smooth and texture areas. Experiments demonstrate the effectiveness of GDPO in enhancing the performance of one-step generative ISR models. Code: https://github.com/Joyies/GDPO.
Problem

Research questions and friction points this paper is trying to address.

one-step generative image super-resolution
reinforcement learning
Direct Preference Optimization
Group Relative Policy Optimization
image detail preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group Direct Preference Optimization
one-step generative image super-resolution
noise-aware diffusion model
unequal-timestep strategy
attribute-aware reward
🔎 Similar Papers
No similar papers found.