Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network

📅 2026-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of underwater image degradation—such as color distortion, low contrast, and poor visibility—caused by light absorption and scattering. Existing methods often suffer from limited generalization due to rigid physical assumptions or insufficient training data. To overcome these limitations, the authors propose a novel enhancement framework that integrates Retinex theory with language-guided semantic priors. The framework features a prior-free illumination estimator, a cross-modal text alignment module, and a semantic-guided restorer, leveraging CLIP-generated textual descriptions to provide high-level semantic guidance. This study pioneers the incorporation of textual semantics into underwater image enhancement, introduces LUIQD-TD—the first large-scale image-text underwater dataset—and designs an Image-Text Semantic Consistency (ITSS) loss. Experiments demonstrate that the method achieves state-of-the-art or comparable performance against 15 leading approaches across four public benchmarks and a newly curated dataset, significantly improving both visual quality and semantic fidelity.

Technology Category

Application Category

📝 Abstract
Underwater images often suffer from severe degradation caused by light absorption and scattering, leading to color distortion, low contrast and reduced visibility. Existing Underwater Image Enhancement (UIE) methods can be divided into two categories, i.e., prior-based and learning-based methods. The former rely on rigid physical assumptions that limit the adaptability, while the latter often face data scarcity and weak generalization. To address these issues, we propose a Physics-Semantics-Guided Underwater Image Enhancement Network (PSG-UIENet), which couples the Retinex-grounded illumination correction with the language-informed guidance. This network comprises a Prior-Free Illumination Estimator, a Cross-Modal Text Aligner and a Semantics-Guided Image Restorer. In particular, the restorer leverages the textual descriptions generated by the Contrastive Language-Image Pre-training (CLIP) model to inject high-level semantics for perceptually meaningful guidance. Since multimodal UIE data sets are not publicly available, we also construct a large-scale image-text UIE data set, namely, LUIQD-TD, which contains 6,418 image-reference-text triplets. To explicitly measure and optimize semantic consistency between textual descriptions and images, we further design an Image-Text Semantic Similarity (ITSS) loss function. To our knowledge, this study makes the first effort to introduce both textual guidance and the multimodal data set into UIE tasks. Extensive experiments on our data set and four publicly available data sets demonstrate that the proposed PSG-UIENet achieves superior or comparable performance against fifteen state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Underwater Image Enhancement
Color Distortion
Low Contrast
Light Scattering
Image Degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retinex
text-guided enhancement
multimodal learning
CLIP
underwater image enhancement
🔎 Similar Papers
No similar papers found.
S
Shixuan Xu
State Key Laboratory of Physical Oceanography and the Faculty of Information Science and Engineering, Ocean University of China, Qingdao, 266100
Y
Yabo Liu
State Key Laboratory of Physical Oceanography and the Faculty of Information Science and Engineering, Ocean University of China, Qingdao, 266100
Junyu Dong
Junyu Dong
Ocean University of China
Xinghui Dong
Xinghui Dong
Ocean University of China
Computer VisionVisual PerceptionTexture AnalysisNon-destructive Testing