From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-light image enhancement and high-level visual understanding have long been treated as disjoint tasks: enhancement relies on fragile hand-crafted priors with poor generalization, while understanding suffers from scarce annotations and task-specific adaptation. This paper proposes GEFU, the first unified paradigm bridging enhancement and understanding. Its core is SCUF—a semantic-consistent unsupervised fine-tuning framework—built upon pretrained diffusion models. SCUF integrates illumination-aware image prompting, cyclic attention adapters, and a dual consistency constraint coupling reflectance estimation and captioning. It enables zero-shot generalization and semantic preservation without requiring downstream task annotations. GEFU supports joint optimization across diverse vision tasks—including classification, detection, and segmentation—under a single model. Experiments demonstrate state-of-the-art performance in both perceptual quality and multiple understanding benchmarks, validating the efficacy and scalability of co-optimizing enhancement and understanding.

Technology Category

Application Category

📝 Abstract
Low-level enhancement and high-level visual understanding in low-light vision have traditionally been treated separately. Low-light enhancement improves image quality for downstream tasks, but existing methods rely on physical or geometric priors, limiting generalization. Evaluation mainly focuses on visual quality rather than downstream performance. Low-light visual understanding, constrained by scarce labeled data, primarily uses task-specific domain adaptation, which lacks scalability. To address these challenges, we build a generalized bridge between low-light enhancement and low-light understanding, which we term Generalized Enhancement For Understanding (GEFU). This paradigm improves both generalization and scalability. To address the diverse causes of low-light degradation, we leverage pretrained generative diffusion models to optimize images, achieving zero-shot generalization performance. Building on this, we propose Semantically Consistent Unsupervised Fine-tuning (SCUF). Specifically, to overcome text prompt limitations, we introduce an illumination-aware image prompt to explicitly guide image generation and propose a cycle-attention adapter to maximize its semantic potential. To mitigate semantic degradation in unsupervised training, we propose caption and reflectance consistency to learn high-level semantics and image-level spatial semantics. Extensive experiments demonstrate that our proposed method outperforms current state-of-the-art methods in traditional image quality and GEFU tasks including classification, detection, and semantic segmentation.
Problem

Research questions and friction points this paper is trying to address.

Bridging low-light enhancement and understanding via generalized paradigm
Overcoming text prompt limits with illumination-aware image guidance
Mitigating semantic degradation in unsupervised fine-tuning via consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pretrained generative diffusion models
Introduces illumination-aware image prompt
Proposes cycle-attention adapter for semantics
🔎 Similar Papers
No similar papers found.
S
Sen Wang
East China Normal University
S
Shao Zeng
Tencent Youtu Lab
T
Tianjun Gu
East China Normal University
Zhizhong Zhang
Zhizhong Zhang
Associate Researcher, East China Normal University
Computer Vision
Ruixin Zhang
Ruixin Zhang
tencent
computer vision
S
Shouhong Ding
Tencent Youtu Lab
Jingyun Zhang
Jingyun Zhang
PhD student, Beihang University
J
Jun Wang
Tencent WeChat Pay Lab
X
Xin Tan
East China Normal University
Y
Yuan Xie
East China Normal University
L
Lizhuang Ma
East China Normal University