🤖 AI Summary
To address the challenge of jointly preserving fine details and suppressing noise amplification in low-light image enhancement, this paper introduces the Structural-guided Diffusion Transformer (SDTL) framework—the first to incorporate Diffusion Transformers (DiT) into this task. Methodologically, we propose a Structure Enhancement Module (SEM) and a Structure-guided Attention Block (SAB), integrating wavelet-compressed features to improve inference efficiency, while leveraging structural priors for accurate texture reconstruction and robust noise suppression. An adaptive multi-scale fusion strategy is further introduced to optimally aggregate hierarchical structural information. Extensive experiments on benchmark datasets—including LOL and SID—demonstrate state-of-the-art performance, with significant improvements in brightness consistency, detail fidelity, and perceptual quality. Our results validate both the effectiveness and novelty of structural guidance in DiT-based low-light enhancement.
📝 Abstract
While the diffusion transformer (DiT) has become a focal point of interest in recent years, its application in low-light image enhancement remains a blank area for exploration. Current methods recover the details from low-light images while inevitably amplifying the noise in images, resulting in poor visual quality. In this paper, we firstly introduce DiT into the low-light enhancement task and design a novel Structure-guided Diffusion Transformer based Low-light image enhancement (SDTL) framework. We compress the feature through wavelet transform to improve the inference efficiency of the model and capture the multi-directional frequency band. Then we propose a Structure Enhancement Module (SEM) that uses structural prior to enhance the texture and leverages an adaptive fusion strategy to achieve more accurate enhancement effect. In Addition, we propose a Structure-guided Attention Block (SAB) to pay more attention to texture-riched tokens and avoid interference from noisy areas in noise prediction. Extensive qualitative and quantitative experiments demonstrate that our method achieves SOTA performance on several popular datasets, validating the effectiveness of SDTL in improving image quality and the potential of DiT in low-light enhancement tasks.