🤖 AI Summary
In single-image super-resolution (SISR), insufficient high-frequency detail generation often leads to artifacts and texture distortion. Existing DDPM-based diffusion models directly predict full-bandwidth high-frequency components and employ only the HR ground truth as the target at each denoising step, causing frequency-domain mismatch and hallucination. To address this, we propose FDDiff, a frequency-domain-guided multi-scale diffusion model. FDDiff introduces a novel wavelet packet-based frequency-domain completion chain that decomposes high-frequency reconstruction into bandwidth-increasing, fine-grained steps. It features a unified multi-scale frequency-domain refinement network for progressive frequency guidance during reverse diffusion. Additionally, it incorporates frequency-domain target scheduling and an end-to-end differentiable high-frequency prediction mechanism. Evaluated on standard benchmarks including Set5 and Set14, FDDiff significantly outperforms existing generative SISR methods, achieving substantial improvements in both reconstruction fidelity and texture realism.
📝 Abstract
The performance of single image super-resolution depends heavily on how to generate and complement high-frequency details to low-resolution images. Recently, diffusion-based models exhibit great potential in generating high-quality images for super-resolution tasks. However, existing models encounter difficulties in directly predicting high-frequency information of wide bandwidth by solely utilizing the high-resolution ground truth as the target for all sampling timesteps. To tackle this problem and achieve higher-quality super-resolution, we propose a novel Frequency Domain-guided multiscale Diffusion model (FDDiff), which decomposes the high-frequency information complementing process into finer-grained steps. In particular, a wavelet packet-based frequency complement chain is developed to provide multiscale intermediate targets with increasing bandwidth for reverse diffusion process. Then FDDiff guides reverse diffusion process to progressively complement the missing high-frequency details over timesteps. Moreover, we design a multiscale frequency refinement network to predict the required high-frequency components at multiple scales within one unified network. Comprehensive evaluations on popular benchmarks are conducted, and demonstrate that FDDiff outperforms prior generative methods with higher-fidelity super-resolution results.