🤖 AI Summary
To address the severe detail loss and color distortion in extremely low-light RAW images—where the signal-to-noise ratio approaches zero—this work pioneers the integration of a pre-trained latent diffusion model (LDM) into the RAW-domain neural ISP task. We propose a lightweight taming module that dynamically modulates intermediate features of the UNet backbone, and design a decoupled reconstruction mechanism: low-frequency semantic content is generated in latent space, while high-frequency details are preserved via the decoder. Crucially, our approach avoids fine-tuning the large-scale diffusion model; instead, it achieves efficient transfer through feature-level adaptation alone. Evaluated on multiple benchmark datasets for low-light RAW image enhancement, our method achieves state-of-the-art quantitative performance (PSNR/SSIM) and delivers superior visual quality compared to existing ISP-based and generative methods. This demonstrates the strong generalizability and practical efficacy of diffusion-based generative priors in extreme low-light ISP.
📝 Abstract
Enhancing a low-light noisy RAW image into a well-exposed and clean sRGB image is a significant challenge for modern digital cameras. Prior approaches have difficulties in recovering fine-grained details and true colors of the scene under extremely low-light environments due to near-to-zero SNR. Meanwhile, diffusion models have shown significant progress towards general domain image generation. In this paper, we propose to leverage the pre-trained latent diffusion model to perform the neural ISP for enhancing extremely low-light images. Specifically, to tailor the pre-trained latent diffusion model to operate on the RAW domain, we train a set of lightweight taming modules to inject the RAW information into the diffusion denoising process via modulating the intermediate features of UNet. We further observe different roles of UNet denoising and decoder reconstruction in the latent diffusion model, which inspires us to decompose the low-light image enhancement task into latent-space low-frequency content generation and decoding-phase high-frequency detail maintenance. Through extensive experiments on representative datasets, we demonstrate our simple design not only achieves state-of-the-art performance in quantitative evaluations but also shows significant superiority in visual comparisons over strong baselines, which highlight the effectiveness of powerful generative priors for neural ISP under extremely low-light environments. The project page is available at https://csqiangwen.github.io/projects/ldm-isp/