Empowering Semantic-Sensitive Underwater Image Enhancement with VLM

πŸ“… 2026-03-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitation of existing underwater image enhancement methods, which often neglect semantic information, leading to enhanced results that deviate from natural image distributions and degrade performance in downstream vision tasks. To overcome this, the study introduces a vision-language model (VLM) into underwater image enhancement for the first time. The VLM generates textual descriptions of key objects in degraded images, which are then leveraged by a text-image alignment model to construct a spatial semantic guidance map. A novel semantics-driven dual-guidance mechanism is proposed to steer the enhancement network toward reconstructing semantically salient regions. Experimental results demonstrate significant improvements in perceptual quality metrics and consistent performance gains across downstream tasks such as object detection and segmentation, validating the method’s effectiveness and generalizability.

Technology Category

Application Category

πŸ“ Abstract
In recent years, learning-based underwater image enhancement (UIE) techniques have rapidly evolved. However, distribution shifts between high-quality enhanced outputs and natural images can hinder semantic cue extraction for downstream vision tasks, thereby limiting the adaptability of existing enhancement models. To address this challenge, this work proposes a new learning mechanism that leverages Vision-Language Models (VLMs) to empower UIE models with semantic-sensitive capabilities. To be concrete, our strategy first generates textual descriptions of key objects from a degraded image via VLMs. Subsequently, a text-image alignment model remaps these relevant descriptions back onto the image to produce a spatial semantic guidance map. This map then steers the UIE network through a dual-guidance mechanism, which combines cross-attention and an explicit alignment loss. This forces the network to focus its restorative power on semantic-sensitive regions during image reconstruction, rather than pursuing a globally uniform improvement, thereby ensuring the faithful restoration of key object features. Experiments confirm that when our strategy is applied to different UIE baselines, significantly boosts their performance on perceptual quality metrics as well as enhances their performance on detection and segmentation tasks, validating its effectiveness and adaptability.
Problem

Research questions and friction points this paper is trying to address.

underwater image enhancement
semantic cues
distribution shift
vision-language models
downstream vision tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Model
Semantic-Sensitive Enhancement
Underwater Image Enhancement
Text-Image Alignment
Dual-Guidance Mechanism
πŸ”Ž Similar Papers
No similar papers found.
Guodong Fan
Guodong Fan
Tianjin University
Service ComputingSoftware EngineeringLarge Language ModelsCombinatorial Optimization
S
Shengning Zhou
Shandong Technology and Business University, Yantai, China
G
Genji Yuan
Shandong Technology and Business University, Yantai, China
Huiyu Li
Huiyu Li
Research Advisor, Economic Research, Federal Reserve Bank of San Francisco
growthfirm dynamicseconometrics and computation
J
Jingchun Zhou
Dalian Maritime University, Dalian, China
J
Jinjiang Li
Shandong Technology and Business University, Yantai, China