Language Prompt vs. Image Enhancement: Boosting Object Detection With CLIP in Hazy Environments

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses the performance degradation of object detection in hazy environments caused by image degradation and semantic ambiguity. Departing from conventional image enhancement approaches, it pioneers the integration of language prompts with CLIP to semantically guide the reinforcement of weakened visual features. The study introduces an Approximate Mutual Exclusivity (AME) mechanism to assign reliable weights to the classification loss and further proposes a Fine-tuned AME (FAME) strategy to adaptively optimize weight balancing during training. Evaluated on HazyCOCO—a large-scale synthetic dataset curated by the authors—the proposed method achieves significant improvements in detection accuracy, establishing state-of-the-art performance under haze conditions.

Technology Category

Application Category

📝 Abstract
Object detection in hazy environments is challenging because degraded objects are nearly invisible and their semantics are weakened by environmental noise, making it difficult for detectors to identify. Common approaches involve image enhancement to boost weakened semantics, but these methods are limited by the instability of enhanced modules. This paper proposes a novel solution by employing language prompts to enhance weakened semantics without image enhancement. Specifically, we design Approximation of Mutual Exclusion (AME) to provide credible weights for Cross-Entropy Loss, resulting in CLIP-guided Cross-Entropy Loss (CLIP-CE). The provided weights assess the semantic weakening of objects. Through the backpropagation of CLIP-CE, weakened semantics are enhanced, making degraded objects easier to detect. In addition, we present Fine-tuned AME (FAME) which adaptively fine-tunes the weight of AME based on the predicted confidence. The proposed FAME compensates for the imbalanced optimization in AME. Furthermore, we present HazyCOCO, a large-scale synthetic hazy dataset comprising 61258 images. Experimental results demonstrate that our method achieves state-of-the-art performance. The code and dataset will be released.
Problem

Research questions and friction points this paper is trying to address.

object detection
hazy environments
semantic weakening
image degradation
CLIP
Innovation

Methods, ideas, or system contributions that make the work stand out.

language prompt
CLIP-guided loss
semantic enhancement
hazy object detection
mutual exclusion approximation
🔎 Similar Papers
J
Jian Pang
College of Control Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, P.R. China
B
Bingfeng Zhang
College of Control Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, P.R. China
J
Jin Wang
College of Control Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, P.R. China
B
Baodi Liu
College of Control Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, P.R. China
Dapeng Tao
Dapeng Tao
Yunnan University
Weifeng Liu
Weifeng Liu
University of Florida
Machine LearningSignal ProcessingKernel adaptive filtering