🤖 AI Summary
Existing super-resolution methods often introduce visual artifacts, yet conventional evaluation metrics struggle to capture perceptual differences in their prominence. This work introduces the concept of “artifact prominence”—defined as the proportion of viewers who perceive a given artifact—and presents SR-Prominence, a crowdsourced dataset comprising 3,935 artifact masks, enabling an annotation-free objective benchmark. Comprehensive evaluation using full-reference metrics (e.g., SSIM, DISTS), no-reference methods, and dedicated detectors reveals that classical full-reference metrics excel at predicting local artifact prominence, whereas no-reference approaches and detectors exhibit limited generalization. Notably, 48.2% of artifacts originally labeled in DeSRA go unnoticed by the majority of observers, underscoring the necessity of perception-driven assessment in super-resolution evaluation.
📝 Abstract
Modern image super-resolution methods generate detailed, visually appealing results, but they often introduce visual artifacts: unnatural patterns and texture distortions that degrade perceived quality. These defects vary widely in perceptual impact--some are barely noticeable, while others are highly disturbing--yet existing detection methods treat them equally. We propose artifact prominence as an evaluative target, defined as the fraction of viewers who judge a highlighted region to contain a noticeable artifact. We design a crowdsourced annotation protocol and construct SR-Prominence, a dataset suite containing 3,935 artifact masks from DeSRA, Open Images, Urban100, and a realistic no-ground-truth Urban100-HR setting, annotated with prominence. Re-annotating DeSRA reveals that 48.2% of its in-lab binary artifacts are not noticed by a majority of viewers. Across the suite, we audit SR artifact detectors, image-quality metrics, and SR methods. We find that classical full-reference metrics, especially SSIM and DISTS, provide surprisingly strong localized prominence signals, whereas no-reference IQA methods and specialized artifact detectors often fail to generalize across datasets and reference settings. SR-Prominence is released with an objective scoring protocol that allows new metrics to be benchmarked on our suite without further crowdsourcing. Together, the data and protocols enable SR artifact evaluation to move from binary defect presence toward perceptual impact. SR-Prominence is available at https://huggingface.co/datasets/imolodetskikh/sr-artifact-prominence.