Realism Control One-step Diffusion for Real-World Image Super-Resolution

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

In real-world image super-resolution (Real-ISR), existing methods struggle to jointly optimize fidelity and perceptual realism while lacking flexible, user-controllable trade-offs. To address this, we propose the first unified one-step diffusion (OSD) framework enabling explicit, controllable fidelity–realism balancing. Methodologically, we introduce a novel latent-space grouping strategy that enables fine-grained control during noise prediction; further, we integrate degradation-aware sampling and visual prompt injection to dynamically steer reconstruction preferences within a single denoising step—eliminating the need for iterative refinement and significantly improving inference efficiency. Extensive experiments on multiple Real-ISR benchmarks demonstrate that our approach consistently outperforms state-of-the-art one-step diffusion models, achieving superior performance in PSNR, LPIPS, and human perceptual evaluation—thereby delivering high fidelity, strong perceptual realism, and real-time inference capability.

Technology Category

Application Category

📝 Abstract

Pre-trained diffusion models have shown great potential in real-world image super-resolution (Real-ISR) tasks by enabling high-resolution reconstructions. While one-step diffusion (OSD) methods significantly improve efficiency compared to traditional multi-step approaches, they still have limitations in balancing fidelity and realism across diverse scenarios. Since the OSDs for SR are usually trained or distilled by a single timestep, they lack flexible control mechanisms to adaptively prioritize these competing objectives, which are inherently manageable in multi-step methods through adjusting sampling steps. To address this challenge, we propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR. RCOD provides a latent domain grouping strategy that enables explicit control over fidelity-realism trade-offs during the noise prediction phase with minimal training paradigm modifications and original training data. A degradation-aware sampling strategy is also introduced to align distillation regularization with the grouping strategy and enhance the controlling of trade-offs. Moreover, a visual prompt injection module is used to replace conventional text prompts with degradation-aware visual tokens, enhancing both restoration accuracy and semantic consistency. Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency. Extensive experiments demonstrate that RCOD outperforms state-of-the-art OSD methods in both quantitative metrics and visual qualities, with flexible realism control capabilities in the inference stage. The code will be released.

Problem

Research questions and friction points this paper is trying to address.

Balancing fidelity and realism in one-step diffusion super-resolution

Enabling adaptive control of competing objectives in image restoration

Improving efficiency while maintaining perceptual quality in Real-ISR

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent domain grouping strategy for fidelity-realism control

Degradation-aware sampling strategy for enhanced trade-off alignment

Visual prompt injection replacing text with degradation tokens

🔎 Similar Papers

TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution