One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models achieve high visual fidelity in image super-resolution (SR) but suffer from prohibitive computational cost; existing acceleration methods often compromise perceptual realism or introduce structural artifacts. To address this, we propose Residual Shift Distillation (RSD), the first distillation framework leveraging a *pseudo-teacher*—a “fake ResShift” model trained from the student—to align with the original teacher, enabling end-to-end single-step SR reconstruction. RSD integrates ResShift knowledge distillation, residual shift modeling, single-step noise prediction, input alignment constraints, and a lightweight architecture. Evaluated on RealSR and DIV2K, RSD achieves state-of-the-art perceptual quality (e.g., superior LPIPS and FID scores), accelerates inference by over 100× compared to iterative diffusion baselines, and reduces parameter count and GPU memory usage significantly relative to text-to-image diffusion models. Crucially, reconstructions remain highly faithful to the degradation characteristics of the input.

Technology Category

Application Category

📝 Abstract
Diffusion models for super-resolution (SR) produce high-quality visual results but require expensive computational costs. Despite the development of several methods to accelerate diffusion-based SR models, some (e.g., SinSR) fail to produce realistic perceptual details, while others (e.g., OSEDiff) may hallucinate non-existent structures. To overcome these issues, we present RSD, a new distillation method for ResShift, one of the top diffusion-based SR models. Our method is based on training the student network to produce such images that a new fake ResShift model trained on them will coincide with the teacher model. RSD achieves single-step restoration and outperforms the teacher by a large margin. We show that our distillation method can surpass the other distillation-based method for ResShift - SinSR - making it on par with state-of-the-art diffusion-based SR distillation methods. Compared to SR methods based on pre-trained text-to-image models, RSD produces competitive perceptual quality, provides images with better alignment to degraded input images, and requires fewer parameters and GPU memory. We provide experimental results on various real-world and synthetic datasets, including RealSR, RealSet65, DRealSR, ImageNet, and DIV2K.
Problem

Research questions and friction points this paper is trying to address.

High computational cost of diffusion-based super-resolution models
Lack of realistic perceptual details in some SR methods
Hallucination of non-existent structures in other SR methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step residual shifting diffusion for SR
Distillation method enhances perceptual details
Reduces parameters and GPU memory usage
🔎 Similar Papers
No similar papers found.