🤖 AI Summary
Low-light image/video understanding is severely hindered by the scarcity of annotated real-world data, while existing synthesis methods rely on unrealistic noise models. This paper introduces the first zero-shot framework for general low-light image and video synthesis. Its core is a self-supervised Degradation Estimation Network (DEN) that operates without camera metadata, modeling diverse, realistic noise directly in the sRGB domain via physics-informed priors—overcoming the distortion and poor generalizability inherent in conventional noise models. The method integrates self-supervised learning, zero-shot synthesis, and physics-driven estimation of noise distribution parameters. Experiments demonstrate significant improvements: 24% reduction in KL divergence for noise reproduction, 21% improvement in LPIPS for video enhancement, and a 62% gain in object detection AP$_{50-95}$. These results substantially alleviate training and evaluation bottlenecks caused by the scarcity of authentic low-light data.
📝 Abstract
Low-light conditions pose significant challenges for both human and machine annotation. This in turn has led to a lack of research into machine understanding for low-light images and (in particular) videos. A common approach is to apply annotations obtained from high quality datasets to synthetically created low light versions. In addition, these approaches are often limited through the use of unrealistic noise models. In this paper, we propose a new Degradation Estimation Network (DEN), which synthetically generates realistic standard RGB (sRGB) noise without the requirement for camera metadata. This is achieved by estimating the parameters of physics-informed noise distributions, trained in a self-supervised manner. This zero-shot approach allows our method to generate synthetic noisy content with a diverse range of realistic noise characteristics, unlike other methods which focus on recreating the noise characteristics of the training data. We evaluate our proposed synthetic pipeline using various methods trained on its synthetic data for typical low-light tasks including synthetic noise replication, video enhancement, and object detection, showing improvements of up to 24% KLD, 21% LPIPS, and 62% AP$_{50-95}$, respectively.