🤖 AI Summary
Existing super-resolution (SR) models improve visual quality but often introduce high-level semantic hallucinations, causing content distortion; conventional low-level metrics (e.g., PSNR, LPIPS) lack sensitivity to such semantic inconsistencies. Method: This work formally defines and quantifies the high-level semantic fidelity problem in SR, introduces FID-SR—the first large-scale benchmark with human-annotated fidelity scores—and reveals the weak correlation between mainstream metrics and semantic consistency. We further propose a multimodal foundation model–based fidelity evaluator and integrate it into SR model fine-tuning via fidelity-aware feedback. Contribution/Results: Empirical analysis shows that prevalent SR models suffer from widespread semantic distortion. Incorporating fidelity feedback enables models to preserve visual quality while significantly improving semantic consistency, validating the effectiveness and practicality of the proposed evaluation paradigm.
📝 Abstract
Recent image Super-Resolution (SR) models are achieving impressive effects in reconstructing details and delivering visually pleasant outputs. However, the overpowering generative ability can sometimes hallucinate and thus change the image content despite gaining high visual quality. This type of high-level change can be easily identified by humans yet not well-studied in existing low-level image quality metrics. In this paper, we establish the importance of measuring high-level fidelity for SR models as a complementary criterion to reveal the reliability of generative SR models. We construct the first annotated dataset with fidelity scores from different SR models, and evaluate how state-of-the-art (SOTA) SR models actually perform in preserving high-level fidelity. Based on the dataset, we then analyze how existing image quality metrics correlate with fidelity measurement, and further show that this high-level task can be better addressed by foundation models. Finally, by fine-tuning SR models based on our fidelity feedback, we show that both semantic fidelity and perceptual quality can be improved, demonstrating the potential value of our proposed criteria, both in model evaluation and optimization. We will release the dataset, code, and models upon acceptance.