From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Low-light image enhancement and high-level visual understanding have long been treated as disjoint tasks: enhancement relies on fragile hand-crafted priors with poor generalization, while understanding suffers from scarce annotations and task-specific adaptation. This paper proposes GEFU, the first unified paradigm bridging enhancement and understanding. Its core is SCUF—a semantic-consistent unsupervised fine-tuning framework—built upon pretrained diffusion models. SCUF integrates illumination-aware image prompting, cyclic attention adapters, and a dual consistency constraint coupling reflectance estimation and captioning. It enables zero-shot generalization and semantic preservation without requiring downstream task annotations. GEFU supports joint optimization across diverse vision tasks—including classification, detection, and segmentation—under a single model. Experiments demonstrate state-of-the-art performance in both perceptual quality and multiple understanding benchmarks, validating the efficacy and scalability of co-optimizing enhancement and understanding.

Technology Category

Application Category

📝 Abstract

Low-level enhancement and high-level visual understanding in low-light vision have traditionally been treated separately. Low-light enhancement improves image quality for downstream tasks, but existing methods rely on physical or geometric priors, limiting generalization. Evaluation mainly focuses on visual quality rather than downstream performance. Low-light visual understanding, constrained by scarce labeled data, primarily uses task-specific domain adaptation, which lacks scalability. To address these challenges, we build a generalized bridge between low-light enhancement and low-light understanding, which we term Generalized Enhancement For Understanding (GEFU). This paradigm improves both generalization and scalability. To address the diverse causes of low-light degradation, we leverage pretrained generative diffusion models to optimize images, achieving zero-shot generalization performance. Building on this, we propose Semantically Consistent Unsupervised Fine-tuning (SCUF). Specifically, to overcome text prompt limitations, we introduce an illumination-aware image prompt to explicitly guide image generation and propose a cycle-attention adapter to maximize its semantic potential. To mitigate semantic degradation in unsupervised training, we propose caption and reflectance consistency to learn high-level semantics and image-level spatial semantics. Extensive experiments demonstrate that our proposed method outperforms current state-of-the-art methods in traditional image quality and GEFU tasks including classification, detection, and semantic segmentation.

Problem

Research questions and friction points this paper is trying to address.

Bridging low-light enhancement and understanding via generalized paradigm

Overcoming text prompt limits with illumination-aware image guidance

Mitigating semantic degradation in unsupervised fine-tuning via consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pretrained generative diffusion models

Introduces illumination-aware image prompt

Proposes cycle-attention adapter for semantics

🔎 Similar Papers

ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement