FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In real-world image super-resolution (Real-ISR), diffusion models suffer from low-frequency bias and a hierarchical “low-frequency-first, high-frequency-later” reconstruction pattern, hindering faithful recovery of high-frequency details. To address this, we propose a plug-and-play self-distillation training framework. First, FFT-based frequency-band decomposition enables frequency-guided inter-layer distillation. Second, intra- and inter-sample contrastive losses—augmented with stochastic negative sampling—enhance structural consistency and detail fidelity. Third, a frequency-adaptive weighting (FAW) scheme and frequency-aware gating modulation (FAM) dynamically balance learning between low-frequency global structure and high-frequency local textures. Crucially, our method requires no architectural modifications or inference-time changes. Evaluated on U-Net and DiT backbones, it consistently improves quantitative metrics (PSNR, SSIM) and perceptual quality (LPIPS, NIQE, MANIQA, MUSIQ), demonstrating strong generalizability and effectiveness across diverse diffusion-based Real-ISR models.

Technology Category

Application Category

📝 Abstract
Real-image super-resolution (Real-ISR) seeks to recover HR images from LR inputs with mixed, unknown degradations. While diffusion models surpass GANs in perceptual quality, they under-reconstruct high-frequency (HF) details due to a low-frequency (LF) bias and a depth-wise "low-first, high-later" hierarchy. We introduce FRAMER, a plug-and-play training scheme that exploits diffusion priors without changing the backbone or inference. At each denoising step, the final-layer feature map teaches all intermediate layers. Teacher and student feature maps are decomposed into LF/HF bands via FFT masks to align supervision with the model's internal frequency hierarchy. For LF, an Intra Contrastive Loss (IntraCL) stabilizes globally shared structure. For HF, an Inter Contrastive Loss (InterCL) sharpens instance-specific details using random-layer and in-batch negatives. Two adaptive modulators, Frequency-based Adaptive Weight (FAW) and Frequency-based Alignment Modulation (FAM), reweight per-layer LF/HF signals and gate distillation by current similarity. Across U-Net and DiT backbones (e.g., Stable Diffusion 2, 3), FRAMER consistently improves PSNR/SSIM and perceptual metrics (LPIPS, NIQE, MANIQA, MUSIQ). Ablations validate the final-layer teacher and random-layer negatives.
Problem

Research questions and friction points this paper is trying to address.

Enhances high-frequency detail reconstruction in diffusion-based super-resolution
Aligns frequency supervision with model's internal hierarchical structure
Improves perceptual quality and fidelity metrics across diverse backbones
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-distillation with frequency-aligned feature maps
Adaptive modulation for per-layer signal reweighting
Contrastive losses stabilizing structure and sharpening details
🔎 Similar Papers
No similar papers found.