๐ค AI Summary
This study addresses privacy preservation and data heterogeneity challenges in cross-institutional dose prediction for radiotherapy. We propose the first federated learning (FL) framework tailored for knowledge-based planning (KBP), built upon a 3D U-Net architecture and evaluated on the OpenKBP dataset under a multi-center, non-IID simulation. Our contributions are threefold: (1) the first systematic validation of FLโs efficacy in KBP; (2) empirical demonstration that non-IID data distribution severely degrades model performance; and (3) identification of standard FedAvgโs limitations in mitigating inter-site performance disparity, thereby justifying customized aggregation strategies. Experiments show that FL training improves test scores by up to 19% over single-center baselines; achieves performance comparable to centralized training under IID conditions; and requires site-adaptive aggregation to bridge performance gaps under non-IID settings.
๐ Abstract
Dose prediction plays a key role in knowledge-based planning (KBP) by automatically generating patient-specific dose distribution. Recent advances in deep learning-based dose prediction methods necessitates collaboration among data contributors for improved performance. Federated learning (FL) has emerged as a solution, enabling medical centers to jointly train deep-learning models without compromising patient data privacy. We developed the FedKBP framework to evaluate the performances of centralized, federated, and individual (i.e. separated) training of dose prediction model on the 340 plans from OpenKBP dataset. To simulate FL and individual training, we divided the data into 8 training sites. To evaluate the effect of inter-site data variation on model training, we implemented two types of case distributions: 1) Independent and identically distributed (IID), where the training and validating cases were evenly divided among the 8 sites, and 2) non-IID, where some sites have more cases than others. The results show FL consistently outperforms individual training on both model optimization speed and out-of-sample testing scores, highlighting the advantage of FL over individual training. Under IID data division, FL shows comparable performance to centralized training, underscoring FL as a promising alternative to traditional pooled-data training. Under non-IID division, larger sites outperformed smaller sites by up to 19% on testing scores, confirming the need of collaboration among data owners to achieve better prediction accuracy. Meanwhile, non-IID FL showed reduced performance as compared to IID FL, posing the need for more sophisticated FL method beyond mere model averaging to handle data variation among participating sites.