🤖 AI Summary
This work addresses the performance degradation of diffusion-based super-resolution methods on real-world low-resolution images caused by distribution shifts. To this end, the authors propose Bird-SR, a novel framework that employs a bidirectional reward-guided diffusion mechanism to prioritize structural fidelity in early stages and enhance perceptual quality in later stages. Bird-SR innovatively integrates trajectory-level preference optimization with a dynamic fidelity-perception weighting strategy to mitigate reward hacking. By incorporating reward feedback learning, relative advantage space rewards, semantic alignment constraints, and a dynamic weighting mechanism, Bird-SR achieves state-of-the-art performance across multiple real-world super-resolution benchmarks, significantly outperforming existing approaches while simultaneously preserving structural consistency and improving perceptual quality.
📝 Abstract
Diffusion-based super-resolution can synthesize rich details, but models trained on synthetic paired data often fail on real-world LR images due to distribution shifts. We propose Bird-SR, a bidirectional reward-guided diffusion framework that formulates super-resolution as trajectory-level preference optimization via reward feedback learning (ReFL), jointly leveraging synthetic LR-HR pairs and real-world LR images. For structural fidelity easily affected in ReFL, the model is directly optimized on synthetic pairs at early diffusion steps, which also facilitates structure preservation for real-world inputs under smaller distribution gap in structure levels. For perceptual enhancement, quality-guided rewards are applied at later sampling steps to both synthetic and real LR images. To mitigate reward hacking, the rewards for synthetic results are formulated in a relative advantage space bounded by their clean counterparts, while real-world optimization is regularized via a semantic alignment constraint. Furthermore, to balance structural and perceptual learning, we adopt a dynamic fidelity-perception weighting strategy that emphasizes structure preservation at early stages and progressively shifts focus toward perceptual optimization at later diffusion steps. Extensive experiments on real-world SR benchmarks demonstrate that Bird-SR consistently outperforms state-of-the-art methods in perceptual quality while preserving structural consistency, validating its effectiveness for real-world super-resolution.