🤖 AI Summary
Low-light stereo images suffer from coupled degradations and black-box characteristics, causing existing methods to rely on spurious correlations (shortcut learning). To address this, we propose a wavelet-domain feature disentanglement framework that explicitly decomposes enhancement into low-frequency illumination correction and high-frequency texture restoration. We introduce a high-frequency-guided cross-view interaction module (HF-CIM) to enforce stereo consistency and a cross-attention-driven detail enhancement module (DTEM) to disentangle illumination and texture features. Leveraging multi-level wavelet decomposition, our method achieves significant improvements in illumination uniformity and high-frequency detail recovery on both synthetic and real-world datasets. It demonstrates strong generalization across diverse low-light stereo scenarios. The source code and benchmark dataset are publicly available.
📝 Abstract
Low-light images suffer from complex degradation, and existing enhancement methods often encode all degradation factors within a single latent space. This leads to highly entangled features and strong black-box characteristics, making the model prone to shortcut learning. To mitigate the above issues, this paper proposes a wavelet-based low-light stereo image enhancement method with feature space decoupling. Our insight comes from the following findings: (1) Wavelet transform enables the independent processing of low-frequency and high-frequency information. (2) Illumination adjustment can be achieved by adjusting the low-frequency component of a low-light image, extracted through multi-level wavelet decomposition. Thus, by using wavelet transform the feature space is decomposed into a low-frequency branch for illumination adjustment and multiple high-frequency branches for texture enhancement. Additionally, stereo low-light image enhancement can extract useful cues from another view to improve enhancement. To this end, we propose a novel high-frequency guided cross-view interaction module (HF-CIM) that operates within high-frequency branches rather than across the entire feature space, effectively extracting valuable image details from the other view. Furthermore, to enhance the high-frequency information, a detail and texture enhancement module (DTEM) is proposed based on cross-attention mechanism. The model is trained on a dataset consisting of images with uniform illumination and images with non-uniform illumination. Experimental results on both real and synthetic images indicate that our algorithm offers significant advantages in light adjustment while effectively recovering high-frequency information. The code and dataset are publicly available at: https://github.com/Cherisherr/WDCI-Net.git.