🤖 AI Summary
Existing single-step real-world image super-resolution methods struggle to balance efficiency and perceptual quality due to the abandonment of the noise-initiated generation mechanism inherent in diffusion models. This work proposes SMFSR, a framework that preserves the stochastic noise initialization of pretrained diffusion models within a single-step super-resolution setting. It achieves direct mapping from noise to high-resolution images via a low-resolution-conditioned SplitMeanFlow, enhanced by Interval Splitting Consistency distillation and a DINOv3-based GAN refinement module to improve textural realism. SMFSR is the first approach to integrate noise-initiated generation, trajectory distillation, and a semantic-aware discriminator, achieving state-of-the-art perceptual quality while maintaining the computational efficiency of single-step inference—significantly outperforming existing single-step diffusion-based super-resolution methods.
📝 Abstract
Pre-trained text-to-image (T2I) diffusion models have shown strong potential for real-world image super-resolution (Real-ISR), owing to their noise-started generation process that enables realistic texture synthesis and captures the one-to-many nature of super-resolution. However, diffusion-based Real-ISR methods still face a fundamental efficiency-quality trade-off. Multi-step methods generate high-quality results by iteratively denoising random Gaussian noise under LR conditioning, but suffer from slow sampling. Recent one-step methods greatly improve efficiency, yet they typically replace noise-started generation with direct LR-to-HR restoration, which weakens stochasticity and limits realistic detail synthesis. To address this issue, we propose SMFSR, a noise-started one-step Real-ISR framework via LR-conditioned SplitMeanFlow and GAN refinement. SMFSR preserves the random-noise starting point of diffusion models and learns a direct noise-to-HR mapping conditioned on the LR image. To this end, Interval Splitting Consistency distills the multi-step generative trajectory into a single average-velocity prediction, enabling efficient one-step generation. To compensate for the reduced opportunity for progressive refinement, we further introduce a GAN refinement stage, where a DINOv3-based discriminator enhances realistic texture synthesis and variational score distillation aligns the generated outputs with the natural image distribution under a frozen diffusion teacher. Extensive experiments demonstrate that SMFSR achieves state-of-the-art perceptual quality among one-step diffusion-based Real-ISR methods while retaining fast single-step inference.