🤖 AI Summary
Low-dose CT (LDCT) suffers from severe quantum and electronic noise, leading to artifacts and loss of fine anatomical details, thereby compromising diagnostic accuracy. To address this, we propose a frequency-guided diffusion Transformer model. First, a frequency-decoupling mechanism is designed to concentrate noise injection and progressive denoising in high-frequency subbands. Second, a dynamic learnable fusion strategy combined with sliding sparse local attention is introduced to jointly model global semantics and preserve local textures. Third, skip connections and a hybrid denoising network are integrated to enhance reconstruction stability. Evaluated on the LUNA16 and AAPM low-dose CT datasets, our method achieves state-of-the-art performance under ultra-low radiation doses, improving PSNR by 2.1 dB and SSIM by 0.032 over existing diffusion- and Transformer-based approaches. It yields more complete artifact suppression and superior recovery of subtle anatomical structures and textural details, thereby enhancing clinical diagnostic reliability.
📝 Abstract
Low-dose computed tomography (LDCT) reduces radiation exposure but suffers from image artifacts and loss of detail due to quantum and electronic noise, potentially impacting diagnostic accuracy. Transformer combined with diffusion models has been a promising approach for image generation. Nevertheless, existing methods exhibit limitations in preserving finegrained image details. To address this issue, frequency domain-directed diffusion transformer (FD-DiT) is proposed for LDCT reconstruction. FD-DiT centers on a diffusion strategy that progressively introduces noise until the distribution statistically aligns with that of LDCT data, followed by denoising processing. Furthermore, we employ a frequency decoupling technique to concentrate noise primarily in high-frequency domain, thereby facilitating effective capture of essential anatomical structures and fine details. A hybrid denoising network is then utilized to optimize the overall data reconstruction process. To enhance the capability in recognizing high-frequency noise, we incorporate sliding sparse local attention to leverage the sparsity and locality of shallow-layer information, propagating them via skip connections for improving feature representation. Finally, we propose a learnable dynamic fusion strategy for optimal component integration. Experimental results demonstrate that at identical dose levels, LDCT images reconstructed by FD-DiT exhibit superior noise and artifact suppression compared to state-of-the-art methods.