🤖 AI Summary
In real-world image super-resolution (Real-ISR), existing methods struggle to jointly optimize fidelity and perceptual realism while lacking flexible, user-controllable trade-offs. To address this, we propose the first unified one-step diffusion (OSD) framework enabling explicit, controllable fidelity–realism balancing. Methodologically, we introduce a novel latent-space grouping strategy that enables fine-grained control during noise prediction; further, we integrate degradation-aware sampling and visual prompt injection to dynamically steer reconstruction preferences within a single denoising step—eliminating the need for iterative refinement and significantly improving inference efficiency. Extensive experiments on multiple Real-ISR benchmarks demonstrate that our approach consistently outperforms state-of-the-art one-step diffusion models, achieving superior performance in PSNR, LPIPS, and human perceptual evaluation—thereby delivering high fidelity, strong perceptual realism, and real-time inference capability.
📝 Abstract
Pre-trained diffusion models have shown great potential in real-world image super-resolution (Real-ISR) tasks by enabling high-resolution reconstructions. While one-step diffusion (OSD) methods significantly improve efficiency compared to traditional multi-step approaches, they still have limitations in balancing fidelity and realism across diverse scenarios. Since the OSDs for SR are usually trained or distilled by a single timestep, they lack flexible control mechanisms to adaptively prioritize these competing objectives, which are inherently manageable in multi-step methods through adjusting sampling steps. To address this challenge, we propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR. RCOD provides a latent domain grouping strategy that enables explicit control over fidelity-realism trade-offs during the noise prediction phase with minimal training paradigm modifications and original training data. A degradation-aware sampling strategy is also introduced to align distillation regularization with the grouping strategy and enhance the controlling of trade-offs. Moreover, a visual prompt injection module is used to replace conventional text prompts with degradation-aware visual tokens, enhancing both restoration accuracy and semantic consistency. Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency. Extensive experiments demonstrate that RCOD outperforms state-of-the-art OSD methods in both quantitative metrics and visual qualities, with flexible realism control capabilities in the inference stage. The code will be released.