Structure-guided Diffusion Transformer for Low-Light Image Enhancement

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly preserving fine details and suppressing noise amplification in low-light image enhancement, this paper introduces the Structural-guided Diffusion Transformer (SDTL) framework—the first to incorporate Diffusion Transformers (DiT) into this task. Methodologically, we propose a Structure Enhancement Module (SEM) and a Structure-guided Attention Block (SAB), integrating wavelet-compressed features to improve inference efficiency, while leveraging structural priors for accurate texture reconstruction and robust noise suppression. An adaptive multi-scale fusion strategy is further introduced to optimally aggregate hierarchical structural information. Extensive experiments on benchmark datasets—including LOL and SID—demonstrate state-of-the-art performance, with significant improvements in brightness consistency, detail fidelity, and perceptual quality. Our results validate both the effectiveness and novelty of structural guidance in DiT-based low-light enhancement.

Technology Category

Application Category

📝 Abstract
While the diffusion transformer (DiT) has become a focal point of interest in recent years, its application in low-light image enhancement remains a blank area for exploration. Current methods recover the details from low-light images while inevitably amplifying the noise in images, resulting in poor visual quality. In this paper, we firstly introduce DiT into the low-light enhancement task and design a novel Structure-guided Diffusion Transformer based Low-light image enhancement (SDTL) framework. We compress the feature through wavelet transform to improve the inference efficiency of the model and capture the multi-directional frequency band. Then we propose a Structure Enhancement Module (SEM) that uses structural prior to enhance the texture and leverages an adaptive fusion strategy to achieve more accurate enhancement effect. In Addition, we propose a Structure-guided Attention Block (SAB) to pay more attention to texture-riched tokens and avoid interference from noisy areas in noise prediction. Extensive qualitative and quantitative experiments demonstrate that our method achieves SOTA performance on several popular datasets, validating the effectiveness of SDTL in improving image quality and the potential of DiT in low-light enhancement tasks.
Problem

Research questions and friction points this paper is trying to address.

Applying diffusion transformers to low-light image enhancement
Reducing noise amplification during image detail recovery
Improving visual quality with structure-guided attention mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structure-guided Diffusion Transformer for enhancement
Wavelet transform for efficient feature compression
Structure-guided Attention Block for noise reduction
Xiangchen Yin
Xiangchen Yin
University of Science and Technology of China(USTC)
AIGCDiffusion ModelImage/Video Generation
Z
Zhenda Yu
Anhui University, China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China
Longtao Jiang
Longtao Jiang
University of Science and Technology of China
Diffusion modelComputer VisionMultimodal retrieval
X
Xin Gao
School of Vehicle and Mobility, Tsinghua University, China, State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing, China
X
Xiao Sun
School of Computer Science and Information Engineering, Hefei University of Technology, China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China
Z
Zhi Liu
Department of Computer and Network Engineering, The University of Electro-Communications, Chofu-shi, Tokyo, 1828585 Japan
X
Xun Yang
University of Science and Technology of China