🤖 AI Summary
To address performance degradation in real-scene image dehazing caused by domain shift, this paper proposes a language-guided adaptive dehazing framework. Methodologically, it introduces CLIP’s cross-modal semantic alignment capability into dehazing for the first time, enabling a clean-label-free, language-driven unsupervised quality assessment mechanism. It further designs a region-aware dehazing module and learnable text prompt engineering to achieve semantic-aware local optimization and human-like prior guidance. Additionally, a novel loss function is formulated based on visual–textual semantic similarity. Extensive experiments demonstrate state-of-the-art performance across multiple real-world benchmarks, with significant improvements in PSNR and SSIM, more natural visual results, and strong generalization capability. The source code is publicly available.
📝 Abstract
Existing methods have achieved remarkable performance in image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by the Contrastive Language-Image Pre-training (CLIP) model's ability to distinguish between hazy and clean images, we leverage it to evaluate dehazing results. Combined with a region-specific dehazing technique and tailored prompt sets, the CLIP model accurately identifies hazy areas, providing a high-quality, human-like prior that guides the fine-tuning process of pre-trained networks. Extensive experiments demonstrate that HazeCLIP achieves state-of-the-art performance in real-word image dehazing, evaluated through both visual quality and image quality assessment metrics. Codes are available at https://github.com/Troivyn/HazeCLIP.