Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work proposes SKILD, a unified framework that jointly addresses image generation and super-resolution within a single unconditional diffusion model—tasks traditionally handled separately and thus unable to support arbitrary scales cohesively. SKILD introduces a scale-invariant diffusion mechanism that progressively attenuates high-frequency details in the spectral domain while injecting spectrally matched Gaussian noise, explicitly treating scale as a coordinate in the diffusion dynamics. This approach eliminates the need for task-specific architectures, conditional branches, classifier guidance, or retraining across scales. Experiments demonstrate that SKILD achieves a FID of 2.65 and an Inception Score of 9.63 on CIFAR-10, supports continuous 2×–8× super-resolution on ImageNet with a single model, surpasses conditional models in perceptual quality, and accurately reconstructs the four-point correlation function of the critical Ising model.

📝 Abstract

Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce $\textbf{SKILD}$, a $\textbf{S}$cale-invariant $\textbf{K}$-Space $\textbf{I}$mage $\textbf{L}$earning $\textbf{D}$iffusion model that unifies generation and continuous super-resolution within a single unconditional framework. Both natural images and critical physical systems exhibit scale invariance, and we leverage it to design a forward process that attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process performs generation and continuous super-resolution by varying only the starting timestep: $\textit{no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor}$. Empirically, SKILD reaches FID $2.65$ and Inception Score $9.63$ on unconditional CIFAR-10, performs $2\times$--$8\times$ super-resolution on ImageNet from a single unconditional checkpoint while outperforming conditional models across perceptual metrics, and reconstructs critical Ising models whose connected four-point correlations closely track the ground truth.

Problem

Research questions and friction points this paper is trying to address.

scale-invariant

image generation

super-resolution

diffusion model

unconditional framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

scale-invariant diffusion

continuous super-resolution

unconditional generation