🤖 AI Summary
Current learning-based fluence map prediction methods lack robustness under clinical distribution shifts, compromising the reliability of intensity-modulated radiation therapy. This work proposes a two-stage physics-constrained Transformer architecture: the first stage predicts dose from anatomical structures, and the second generates beam fluence maps, with a physics-informed loss enforcing energy consistency. Cross-domain evaluation via geometric perturbations and noise injection reveals that conventional SSIM metrics inadequately capture clinically critical errors. In contrast, models incorporating hierarchical attention mechanisms—such as SwinUNETR—demonstrate stable performance under moderate perturbations and significantly mitigate the growth of high-percentile energy errors under severe rotation and noise, thereby exhibiting superior robustness.
📝 Abstract
Learning-based fluence map prediction offers a fast alternative to iterative inverse planning in intensity-modulated radiation therapy (IMRT), but its robustness under realistic distribution shifts remains unclear. We study a two-stage transformer pipeline that maps anatomy (CT and contours) to dose and then to beamlet fluence maps. We compare fluence-stage transformer backbones with hierarchical, global, and hybrid attention, trained with a physics-informed loss enforcing energy consistency. Robustness is evaluated under geometric perturbations, radiometric noise, reduced training data, and domain shifts using a prostate IMRT dataset, with additional evaluation of the dose stage on public datasets. Results show smooth degradation under moderate perturbations but sharp failures under severe rotations and noise. Hierarchical transformers (e.g., SwinUNETR) exhibit slower growth in upper-quartile energy error, indicating improved robustness. We further show that SSIM alone fails to capture clinically relevant errors, highlighting the need for physics-informed evaluation.