🤖 AI Summary
Arbitrary-scale image super-resolution (ArbiSR) requires modeling a continuous and broad range of scale factors, yet existing single-stage methods suffer from poor generalization, and progressive diffusion schemes remain unexplored systematically. This paper proposes CasArbi, a self-cascaded diffusion framework that decomposes large-scale upsampling into multiple small-step diffusion stages. We design a coordinate-guided residual diffusion model that jointly incorporates coordinate embeddings and continuous scale parameterization, enabling efficient and flexible scale-aware modeling and sampling. Evaluated on multiple ArbiSR benchmarks, CasArbi achieves state-of-the-art performance in both perceptual quality (LPIPS) and fidelity metrics (PSNR/SSIM), significantly outperforming existing diffusion-based and non-diffusion-based approaches.
📝 Abstract
Arbitrary-scale image super-resolution aims to upsample images to any desired resolution, offering greater flexibility than traditional fixed-scale super-resolution. Recent approaches in this domain utilize regression-based or generative models, but many of them are a single-stage upsampling process, which may be challenging to learn across a wide, continuous distribution of scaling factors. Progressive upsampling strategies have shown promise in mitigating this issue, yet their integration with diffusion models for flexible upscaling remains underexplored. Here, we present CasArbi, a novel self-cascaded diffusion framework for arbitrary-scale image super-resolution. CasArbi meets the varying scaling demands by breaking them down into smaller sequential factors and progressively enhancing the image resolution at each step with seamless transitions for arbitrary scales. Our novel coordinate-guided residual diffusion model allows for the learning of continuous image representations while enabling efficient diffusion sampling. Extensive experiments demonstrate that our CasArbi outperforms prior arts in both perceptual and distortion performance metrics across diverse arbitrary-scale super-resolution benchmarks.