🤖 AI Summary
This study addresses the challenge of missing multimodal MRI in nasopharyngeal carcinoma (NPC) radiotherapy planning, often caused by patient discomfort, high scanning costs, and prolonged acquisition times. To this end, the authors propose the first unified any-to-all MRI synthesis framework tailored for NPC radiotherapy. By leveraging contrastive learning to extract modality-invariant representations and integrating a CLIP-driven text-guided decoder, the model enables semantically consistent and controllable synthesis of all target MRI modalities. Trained on 40,825 images from 13 institutions and validated across 26 internal and external test sets comprising 15,748 images, the framework achieves an average SSIM of 0.90 and PSNR of 27, significantly improving synthesis fidelity and robustness. Moreover, it enhances performance in downstream radiotherapy tasks such as segmentation, while demonstrating strong anatomical adaptability, clinical interpretability, and cross-modal generalization capability.
📝 Abstract
Magnetic resonance imaging (MRI) is essential for nasopharyngeal carcinoma (NPC) radiotherapy (RT), but practical constraints, such as patient discomfort, long scan times, and high costs often lead to incomplete modalities in clinical practice, compromising RT planning accuracy. Traditional MRI synthesis methods are modality-specific, limited in anatomical adaptability, and lack clinical interpretability-failing to meet NPC's RT needs. Here, we developed a unified foundation model integrating contrastive visual representation learning and vision-language alignment (VLA) to enable any-to-all MRI synthesis. The model uses a contrastive encoder for modality-invariant representations and a CLIP-based text-informed decoder for semantically consistent synthesis, supporting any-to-all MRI synthesis via one unified foundation model. Trained on 40,825 images from 13 institutions, it achieves consistently high performance (average SSIM 0.90, PSNR 27) across 26 internal/external validation sites (15,748 images), with superior synthesis fidelity and robustness to noise and domain shifts. Meanwhile, its unified representation enhances downstream RT-relevant tasks (e.g., segmentation). This work advances digital medicine solutions for NPC care by leveraging foundation models to bridge technical synthesis and clinical utility.