🤖 AI Summary
Blind dehazing image quality assessment (BDQA) faces challenges of being reference-free and suffering from scarce, small-scale annotated data. To address these issues, this paper introduces the CLIP model to BDQA for the first time and proposes a dual-branch prompt learning framework. It employs a global–local joint input mechanism to fuse multi-scale visual features and jointly fine-tunes CLIP’s vision and language encoders with quality-oriented, learnable prompts to enable end-to-end quality scoring. The method eliminates reliance on handcrafted features or large-scale dehazing quality assessment (DQA) annotations, thereby significantly alleviating data scarcity. Evaluated on two real-world, reference-free DQA benchmarks, our approach achieves substantial improvements over existing state-of-the-art methods. The source code is publicly available.
📝 Abstract
Blind dehazed image quality assessment (BDQA), which aims to accurately predict the visual quality of dehazed images without any reference information, is essential for the evaluation, comparison, and optimization of image dehazing algorithms. Existing learning-based BDQA methods have achieved remarkable success, while the small scale of DQA datasets limits their performance. To address this issue, in this paper, we propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. Specifically, inspired by the fact that the human visual system understands images based on hierarchical features, we take global and local information of the dehazed image as the input of CLIP. To accurately map the input hierarchical information of dehazed images into the quality score, we tune both the vision branch and language branch of CLIP with prompt learning. Experimental results on two authentic DQA datasets demonstrate that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods. The code is available at https://github.com/JunFu1995/CLIP-DQA.