🤖 AI Summary
This work addresses the challenge of dynamic scene imaging under extremely low-light conditions, where photon scarcity leads to severe noise and texture loss. Existing approaches often overlook the intrinsic noise characteristics of both RAW images and event data. To overcome this limitation, we propose a diffusion model–based hybrid imaging framework that jointly leverages the linear photoresponse of RAW sensors and the high temporal sensitivity of event cameras to brightness changes. Our method introduces, for the first time, a physics-driven dual-modality denoising constraint and incorporates dynamic signal-to-noise ratio estimation to enable adaptive feature fusion. Evaluated on our newly curated REAL dataset—comprising 47,800 pixel-aligned triplets of ultra-low-light RAW images, event streams, and well-exposed references—the proposed approach significantly outperforms state-of-the-art methods across illuminance levels from 0.001 to 0.8 lux, achieving high-quality visual reconstruction in extreme darkness.
📝 Abstract
High-quality imaging of dynamic scenes in extremely low-light conditions is highly challenging. Photon scarcity induces severe noise and texture loss, causing significant image degradation. Event cameras, featuring a high dynamic range (120 dB) and high sensitivity to motion, serve as powerful complements to conventional cameras by offering crucial cues for preserving subtle textures. However, most existing approaches emphasize texture recovery from events, while paying little attention to image noise or the intrinsic noise of events themselves, which ultimately hinders accurate pixel reconstruction under photon-starved conditions. In this work, we propose NEC-Diff, a novel diffusion-based event-RAW hybrid imaging framework that extracts reliable information from heavily noisy signals to reconstruct fine scene structures. The framework is driven by two key insights: (1) combining the linear light-response property of RAW images with the brightness-change nature of events to establish a physics-driven constraint for robust dual-modal denoising; and (2) dynamically estimating the SNR of both modalities based on denoising results to guide adaptive feature fusion, thereby injecting reliable cues into the diffusion process for high-fidelity visual reconstruction. Furthermore, we construct the REAL (Raw and Event Acquired in Low-light) dataset which provides 47,800 pixel-aligned low-light RAW images, events, and high-quality references under 0.001-0.8 lux illumination. Extensive experiments demonstrate the superiority of NEC-Diff under extreme darkness. The project are available at: https://github.com/jinghan-xu/NEC-Diff.