🤖 AI Summary
Existing no-reference image quality assessment (IQA) methods exhibit poor cross-dataset generalization, particularly under distribution shifts such as user-generated content, synthetic imagery, and low-light conditions. To address this, we propose the first generic IQA framework leveraging the cross-attention mechanism of text-guided latent diffusion models (LDMs). Our method introduces learnable, quality-aware textual prompts and models prompt–image alignment to derive robust quality representations. Crucially, it exploits intermediate cross-attention features from the LDM denoising process—enabling zero-shot transfer to multiple benchmark datasets without fine-tuning. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods on diverse databases including LIVE-Youtube, KoNViD, and UHD-1. Moreover, it achieves superior out-of-distribution generalization, validating its effectiveness under substantial domain shifts. This work establishes a novel paradigm for leveraging generative model priors in blind IQA, bridging semantic understanding and perceptual quality estimation.
📝 Abstract
The design of no-reference (NR) image quality assessment (IQA) algorithms is extremely important to benchmark and calibrate user experiences in modern visual systems. A major drawback of state-of-the-art NR-IQA methods is their limited ability to generalize across diverse IQA settings with reasonable distribution shifts. Recent text-to-image generative models such as latent diffusion models generate meaningful visual concepts with fine details related to text concepts. In this work, we leverage the denoising process of such diffusion models for generalized IQA by understanding the degree of alignment between learnable quality-aware text prompts and images. In particular, we learn cross-attention maps from intermediate layers of the denoiser of latent diffusion models to capture quality-aware representations of images. In addition, we also introduce learnable quality-aware text prompts that enable the cross-attention features to be better quality-aware. Our extensive cross database experiments across various user-generated, synthetic, and low-light content-based benchmarking databases show that latent diffusion models can achieve superior generalization in IQA when compared to other methods in the literature.