Language Prompt vs. Image Enhancement: Boosting Object Detection With CLIP in Hazy Environments

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the performance degradation of object detection in hazy environments caused by image degradation and semantic ambiguity. Departing from conventional image enhancement approaches, it pioneers the integration of language prompts with CLIP to semantically guide the reinforcement of weakened visual features. The study introduces an Approximate Mutual Exclusivity (AME) mechanism to assign reliable weights to the classification loss and further proposes a Fine-tuned AME (FAME) strategy to adaptively optimize weight balancing during training. Evaluated on HazyCOCO—a large-scale synthetic dataset curated by the authors—the proposed method achieves significant improvements in detection accuracy, establishing state-of-the-art performance under haze conditions.

Technology Category

Application Category

📝 Abstract

Object detection in hazy environments is challenging because degraded objects are nearly invisible and their semantics are weakened by environmental noise, making it difficult for detectors to identify. Common approaches involve image enhancement to boost weakened semantics, but these methods are limited by the instability of enhanced modules. This paper proposes a novel solution by employing language prompts to enhance weakened semantics without image enhancement. Specifically, we design Approximation of Mutual Exclusion (AME) to provide credible weights for Cross-Entropy Loss, resulting in CLIP-guided Cross-Entropy Loss (CLIP-CE). The provided weights assess the semantic weakening of objects. Through the backpropagation of CLIP-CE, weakened semantics are enhanced, making degraded objects easier to detect. In addition, we present Fine-tuned AME (FAME) which adaptively fine-tunes the weight of AME based on the predicted confidence. The proposed FAME compensates for the imbalanced optimization in AME. Furthermore, we present HazyCOCO, a large-scale synthetic hazy dataset comprising 61258 images. Experimental results demonstrate that our method achieves state-of-the-art performance. The code and dataset will be released.

Problem

Research questions and friction points this paper is trying to address.

object detection

hazy environments

semantic weakening

image degradation

CLIP

Innovation

Methods, ideas, or system contributions that make the work stand out.

language prompt

CLIP-guided loss

semantic enhancement