🤖 AI Summary
To address insufficient frame-event modality collaboration and severe spatiotemporal misalignment in low-light image enhancement, this paper proposes a two-stage decoupled framework. In the first stage, amplitude-phase entanglement is modeled in the Fourier domain to restore visibility; in the second stage, the high dynamic range and temporal continuity of event streams are leveraged to refine image structure. We innovatively design a dynamic spatiotemporal alignment fusion mechanism to mitigate modality mismatch and introduce spatial-frequency interpolation to generate negative samples, enabling a contrastive learning loss that enhances feature discriminability. Experiments demonstrate state-of-the-art performance on benchmarks including LOL and ExDark, with significant improvements in PSNR, SSIM, and visual realism. The method effectively balances brightness recovery, detail preservation, and structural fidelity.
📝 Abstract
The event camera, benefiting from its high dynamic range and low latency, provides performance gain for low-light image enhancement. Unlike frame-based cameras, it records intensity changes with extremely high temporal resolution, capturing sufficient structure information. Currently, existing event-based methods feed a frame and events directly into a single model without fully exploiting modality-specific advantages, which limits their performance. Therefore, by analyzing the role of each sensing modality, the enhancement pipeline is decoupled into two stages: visibility restoration and structure refinement. In the first stage, we design a visibility restoration network with amplitude-phase entanglement by rethinking the relationship between amplitude and phase components in Fourier space. In the second stage, a fusion strategy with dynamic alignment is proposed to mitigate the spatial mismatch caused by the temporal resolution discrepancy between two sensing modalities, aiming to refine the structure information of the image enhanced by the visibility restoration network. In addition, we utilize spatial-frequency interpolation to simulate negative samples with diverse illumination, noise and artifact degradations, thereby developing a contrastive loss that encourages the model to learn discriminative representations. Experiments demonstrate that the proposed method outperforms state-of-the-art models.