🤖 AI Summary
In radiotherapy planning, inferring multi-beam fluence maps from dose distributions is an ill-posed inverse problem; existing CNN-based methods struggle to capture long-range anatomical-geometric dependencies, often yielding structurally distorted or physically infeasible solutions. This paper proposes a two-stage Transformer framework: the first stage generates anatomy-guided global dose priors, while the second stage fuses beam geometric information to regress physically realizable fluence maps. We introduce Fluence-Aware Regression (FAR) loss—a novel objective unifying voxel-wise accuracy, gradient smoothness, structural consistency, and beam energy conservation. The architecture is backbone-agnostic, compatible with medical Transformers such as Swin UNETR. Evaluated on a prostate IMRT dataset, our method reduces energy error to 4.5% and achieves significantly higher structural fidelity than both CNN-based and single-stage approaches (p < 0.05).
📝 Abstract
Fluence map prediction is central to automated radiotherapy planning but remains an ill-posed inverse problem due to the complex relationship between volumetric anatomy and beam-intensity modulation. Convolutional methods in prior work often struggle to capture long-range dependencies, which can lead to structurally inconsistent or physically unrealizable plans. We introduce extbf{FluenceFormer}, a backbone-agnostic transformer framework for direct, geometry-aware fluence regression. The model uses a unified two-stage design: Stage~1 predicts a global dose prior from anatomical inputs, and Stage~2 conditions this prior on explicit beam geometry to regress physically calibrated fluence maps. Central to the approach is the extbf{Fluence-Aware Regression (FAR)} loss, a physics-informed objective that integrates voxel-level fidelity, gradient smoothness, structural consistency, and beam-wise energy conservation. We evaluate the generality of the framework across multiple transformer backbones, including Swin UNETR, UNETR, nnFormer, and MedFormer, using a prostate IMRT dataset. FluenceFormer with Swin UNETR achieves the strongest performance among the evaluated models and improves over existing benchmark CNN and single-stage methods, reducing Energy Error to $mathbf{4.5%}$ and yielding statistically significant gains in structural fidelity ($p < 0.05$).