🤖 AI Summary
This work addresses the lack of objective evaluation standards for aesthetic appeal in AI-based super-resolution (SR) images. To this end, we construct the first large-scale subjectively annotated dataset for attractiveness assessment—comprising 1,496 SR images derived from 136 source images and generated by five state-of-the-art SR methods—and establish a benchmark evaluation protocol based on crowdsourced subjective ratings (AVRate Voyager). We propose a dual-path interpretable attractiveness prediction framework that jointly leverages ResNet-based transfer features and handcrafted signal features (e.g., texture, sharpness, artifact statistics), augmented by a DNN-based attribution classifier for diagnostic analysis. Experiments demonstrate that Real-ESRGAN and BSRGAN yield the most attractive outputs; our model significantly outperforms leading no-reference image quality metrics—including LPIPS and MANIQA—in attractiveness prediction. Both code and dataset are fully open-sourced.
📝 Abstract
DNN- or AI-based up-scaling algorithms are gaining in popularity due to the improvements in machine learning. Various up-scaling models using CNNs, GANs or mixed approaches have been published. The majority of models are evaluated using PSRN and SSIM or only a few example images. However, a performance evaluation with a wide range of real-world images and subjective evaluation is missing, which we tackle in the following paper. For this reason, we describe our developed dataset, which uses 136 base images and five different up-scaling methods, namely Real-ESRGAN, BSRGAN, waifu2x, KXNet, and Lanczos. Overall the dataset consists of 1496 annotated images. The labeling of our dataset focused on image appeal and has been performed using crowd-sourcing employing our open-source tool AVRate Voyager. We evaluate the appeal of the different methods, and the results indicate that Real-ESRGAN and BSRGAN are the best. Furthermore, we train a DNN to detect which up-scaling method has been used, the trained models have a good overall performance in our evaluation. In addition to this, we evaluate state-of-the-art image appeal and quality models, here none of the models showed a high prediction performance, therefore we also trained two own approaches. The first uses transfer learning and has the best performance, and the second model uses signal-based features and a random forest model with good overall performance. We share the data and implementation to allow further research in the context of open science.