🤖 AI Summary
To address the anatomical fidelity challenge in MRI-to-CT and CBCT-to-CT cross-modal 3D synthesis for adaptive radiotherapy, this work proposes a fully ConvNeXt-driven 3D conditional GAN framework. Methodologically, it employs a U-shaped ConvNeXt generator and a multi-head segmentation discriminator jointly optimized with Dice loss and cross-entropy; introduces a segmentation-guided masked MAE loss; and integrates perceptual, adversarial, and MAE losses. Sliding-window inference with average-fold reconstruction ensures volumetric consistency. Key contributions include: (i) the first fully ConvNeXt-based 3D GAN architecture; (ii) a masked MAE loss enabling dual fidelity—structural and tissue-level; and (iii) a multi-head segmentation discriminator enhancing anatomical specificity. The method achieves stable convergence without fine-tuning on multi-center data (3,000 epochs for MRI-to-CT; 1,000 for CBCT-to-CT), yielding clinically acceptable CT synthesis accuracy for radiotherapy dose calculation.
📝 Abstract
The synthesis of computed tomography (CT) from magnetic resonance imaging (MRI) and cone-beam CT (CBCT) plays a critical role in clinical treatment planning by enabling accurate anatomical representation in adaptive radiotherapy. In this work, we propose GANeXt, a 3D patch-based, fully ConvNeXt-powered generative adversarial network for unified CT synthesis across different modalities and anatomical regions. Specifically, GANeXt employs an efficient U-shaped generator constructed from stacked 3D ConvNeXt blocks with compact convolution kernels, while the discriminator adopts a conditional PatchGAN. To improve synthesis quality, we incorporate a combination of loss functions, including mean absolute error (MAE), perceptual loss, segmentation-based masked MAE, and adversarial loss and a combination of Dice loss and cross-entropy for multi-head segmentation discriminator. For both tasks, training is performed with a batch size of 8 using two separate AdamW optimizers for the generator and discriminator, each equipped with a warmup and cosine decay scheduler, with learning rates of $5 imes10^{-4}$ and $1 imes10^{-3}$, respectively. Data preprocessing includes deformable registration, foreground cropping, percentile normalization for the input modality, and linear normalization of the CT to the range $[-1024, 1000]$. Data augmentation involves random zooming within $(0.8, 1.3)$ (for MRI-to-CT only), fixed-size cropping to $32 imes160 imes192$ for MRI-to-CT and $32 imes128 imes128$ for CBCT-to-CT, and random flipping. During inference, we apply a sliding-window approach with $0.8$ overlap and average folding to reconstruct the full-size sCT, followed by inversion of the CT normalization. After joint training on all regions without any fine-tuning, the final models are selected at the end of 3000 epochs for MRI-to-CT and 1000 epochs for CBCT-to-CT using the full training dataset.