🤖 AI Summary
This study addresses the failure of objective image quality assessment (IQA) metrics near the just-noticeable difference (JND) threshold in high-fidelity image compression, where subtle compression artifacts evade reliable detection. We systematically evaluate the sensitivity and reliability of mainstream IQA metrics to such fine-grained distortions. To this end, we propose Z-RMSE—a metric incorporating subjective rating uncertainty—and design a novel statistical evaluation framework grounded in hypothesis testing. Furthermore, we construct and publicly release the first benchmark dataset dedicated to high-fidelity compression, comprising the full-range JPEG AIC-3 dataset, a JND-subset, cropping-effect analysis, and integrated evaluation tools. Experiments reveal that existing metrics suffer from overfitting and insufficient discriminability below the JND threshold. Our approach significantly improves consistency and robustness in fine-grained distortion assessment, providing a reproducible benchmark, a principled statistical framework, and open-source infrastructure for next-generation IQA research.
📝 Abstract
Nowadays, image compression solutions are increasingly designed to operate within high-fidelity quality ranges, where preserving even the most subtle details of the original image is essential. In this context, the ability to detect and quantify subtle compression artifacts becomes critically important, as even slight degradations can impact perceptual quality in professional or quality sensitive applications, such as digital archiving, professional editing and web delivery. However, the performance of current objective image quality assessment metrics in this range has not been thoroughly investigated. In particular, it is not well understood how reliably these metrics estimate distortions at or below the threshold of Just Noticeable Difference (JND). This study directly addresses this issue by proposing evaluation methodologies for assessing the performance of objective quality metrics and performing a comprehensive evaluation using the JPEG AIC-3 dataset which is designed for high-fidelity image compression. Beyond conventional criteria, the study introduces Z-RMSE to incorporate subjective score uncertainty and applies novel statistical tests to assess significant differences between metrics. The analysis spans the full JPEG AIC-3 range and its high- and medium-fidelity subsets, examines the impact of cropping in subjective tests, and a public dataset with benchmarks and evaluation tools is released to support further research.