🤖 AI Summary
This study addresses the challenge of Hounsfield Unit (HU) inaccuracies in cone-beam computed tomography (CBCT) images caused by scatter, noise, and reconstruction artifacts, which hinder their use in radiotherapy requiring quantitative precision. To overcome this, the authors propose a conditional 3D latent diffusion model that, for the first time, incorporates the physical equivariance between volumetric rotation and projection angle offset directly into the diffusion process. By enforcing imaging physics constraints in the projection domain, the method enhances the physical consistency and generalization capability of synthesized CT images. The model employs a lightweight 3D autoencoder to construct a compact latent space and integrates an equivariant loss in the projection domain with a conditional latent diffusion mechanism for efficient training. Evaluated on both simulated and clinical data, the approach achieves PSNR improvements of 7.4 dB and 1.8 dB, respectively, significantly outperforming existing methods while simultaneously improving SSIM and tissue-specific HU accuracy.
📝 Abstract
Cone-beam CT (CBCT) is routinely acquired during radiotherapy for patient setup, but its quantitative reliability is degraded by scatter, noise, and reconstruction artifacts, limiting Hounsfield Unit (HU) accuracy. We propose EPC-3D-Diff, a novel conditional 3D latent diffusion framework for volumetric CBCT to CT synthesis that introduces a projection domain equivariance loss derived from acquisition physics. Unlike common image domain equivariance, we exploit the fact that an in plane rotation of the volume corresponds to an angular shift in its projections. During training, we enforce this relationship by forward projecting rotated synthesized CT volumes and matching them to appropriately angle shifted projections of the paired target CT, yielding a physics consistent equivariance constraint integrated into the diffusion objective. To capture full 3D context efficiently, conditional diffusion is performed in a compact latent space learnt by a lightweight 3D autoencoder, preserving axial depth while downsampling in plane resolution for stable training. We validate on a paired head CBCT/CT phantom dataset, including repeat scans, and paired clinical data using patient wise splits, and perform single and mixed domain training, ablations, and comparisons with diffusion and CycleGAN. EPC-3D-Diff generalizes well and achieved substantial improvements, +7.4 dB (phantom) and +1.8 dB (clinical data) in PSNR compared to state of the art methods, alongside improved SSIM and HU accuracy, within tissue boundaries. Overall, EPC-3D-Diff improves robustness and physics consistency, supporting HU aware synthesis for downstream radiotherapy workflows.