🤖 AI Summary
Super-resolution (SR) inherently suffers from a trade-off between perceptual quality and pixel-level fidelity: conventional L1/L2 losses yield blurry reconstructions, while adversarial training lacks explicit perceptual objectives. To address this, we propose a differentiable, no-reference image quality assessment (NR-IQA)-guided SR framework. Our method systematically evaluates the accuracy and complementarity of mainstream NR-IQA models on human-annotated SR datasets and integrates them as differentiable perceptual objectives into the training pipeline. Through data resampling and joint optimization, the framework explicitly balances perceptual quality and distortion. Crucially, it requires neither ground-truth references nor generative adversarial network (GAN) architectures. Extensive experiments demonstrate substantial improvements in human preference scores across multiple benchmarks, outperforming standard L1/L2-based and adversarial SR methods while preserving structural fidelity. This work establishes the first systematic investigation of NR-IQA models as differentiable perceptual supervisors for SR and delivers a reference-free, GAN-free solution achieving superior perceptual-distortion trade-offs.
📝 Abstract
Super-resolution (SR), a classical inverse problem in computer vision, is inherently ill-posed, inducing a distribution of plausible solutions for every input. However, the desired result is not simply the expectation of this distribution, which is the blurry image obtained by minimizing pixelwise error, but rather the sample with the highest image quality. A variety of techniques, from perceptual metrics to adversarial losses, are employed to this end. In this work, we explore an alternative: utilizing powerful non-reference image quality assessment (NR-IQA) models in the SR context. We begin with a comprehensive analysis of NR-IQA metrics on human-derived SR data, identifying both the accuracy (human alignment) and complementarity of different metrics. Then, we explore two methods of applying NR-IQA models to SR learning: (i) altering data sampling, by building on an existing multi-ground-truth SR framework, and (ii) directly optimizing a differentiable quality score. Our results demonstrate a more human-centric perception-distortion tradeoff, focusing less on non-perceptual pixel-wise distortion, instead improving the balance between perceptual fidelity and human-tuned NR-IQA measures.