🤖 AI Summary
Low-light image enhancement has long been hindered by reliance on paired training data, while existing unsupervised methods struggle to model unknown degradations. To address this, we propose the Zero-reference Lighting Estimation Diffusion Model (ZLIDM), the first framework integrating diffusion processes with zero-reference learning to enable end-to-end unsupervised enhancement without ground-truth exposure labels. Methodologically, ZLIDM employs an initial optimization network to guide diffusion priors and introduces multi-objective bidirectional constraints, frequency-domain feature modeling, and semantic-guided appearance reconstruction for fine-grained, joint recovery of structure, texture, and semantics. Extensive experiments demonstrate that ZLIDM significantly outperforms state-of-the-art methods across multiple benchmarks, achieving superior generalization and robust adaptation to real-world scenarios. The code will be made publicly available.
📝 Abstract
Diffusion model-based low-light image enhancement methods rely heavily on paired training data, leading to limited extensive application. Meanwhile, existing unsupervised methods lack effective bridging capabilities for unknown degradation. To address these limitations, we propose a novel zero-reference lighting estimation diffusion model for low-light image enhancement called Zero-LED. It utilizes the stable convergence ability of diffusion models to bridge the gap between low-light domains and real normal-light domains and successfully alleviates the dependence on pairwise training data via zero-reference learning. Specifically, we first design the initial optimization network to preprocess the input image and implement bidirectional constraints between the diffusion model and the initial optimization network through multiple objective functions. Subsequently, the degradation factors of the real-world scene are optimized iteratively to achieve effective light enhancement. In addition, we explore a frequency-domain based and semantically guided appearance reconstruction module that encourages feature alignment of the recovered image at a fine-grained level and satisfies subjective expectations. Finally, extensive experiments demonstrate the superiority of our approach to other state-of-the-art methods and more significant generalization capabilities. We will open the source code upon acceptance of the paper.