🤖 AI Summary
This study addresses tooth segmentation in multi-center dental panoramic radiographs under federated learning (FL), focusing on robustness to data-quality heterogeneity—including label noise and image degradation—while comparing FL against centralized and local learning.
Method: We implement FL using an Attention U-Net architecture atop the Flower framework and propose a novel client-level anomaly detection mechanism based on loss-curve dynamics to identify participants with corrupted or low-quality data. Non-parametric statistical significance of segmentation performance is rigorously assessed via Wilcoxon signed-rank tests on Dice, IoU, Hausdorff Distance at 95th percentile (HD95), and Average Symmetric Surface Distance (ASSD).
Contribution/Results: FL achieves optimal or near-optimal performance across all settings—including baseline and degraded-data scenarios—with median Dice = 0.9489 and ASSD = 1.31—significantly outperforming local learning (p < 0.01) while preserving data privacy and enabling clinically feasible deployment.
📝 Abstract
Objectives: Federated learning (FL) may mitigate privacy constraints, heterogeneous data quality, and inconsistent labeling in dental diagnostic AI. We compared FL with centralized (CL) and local learning (LL) for tooth segmentation in panoramic radiographs across multiple data corruption scenarios. Methods: An Attention U-Net was trained on 2066 radiographs from six institutions across four settings: baseline (unaltered data); label manipulation (dilated/missing annotations); image-quality manipulation (additive Gaussian noise); and exclusion of a faulty client with corrupted data. FL was implemented via the Flower AI framework. Per-client training- and validation-loss trajectories were monitored for anomaly detection and a set of metrics (Dice, IoU, HD, HD95 and ASSD) was evaluated on a hold-out test set. From these metrics significance results were reported through Wilcoxon signed-rank test. CL and LL served as comparators. Results: Baseline: FL achieved a median Dice of 0.94889 (ASSD: 1.33229), slightly better than CL at 0.94706 (ASSD: 1.37074) and LL at 0.93557-0.94026 (ASSD: 1.51910-1.69777). Label manipulation: FL maintained the best median Dice score at 0.94884 (ASSD: 1.46487) versus CL's 0.94183 (ASSD: 1.75738) and LL's 0.93003-0.94026 (ASSD: 1.51910-2.11462). Image noise: FL led with Dice at 0.94853 (ASSD: 1.31088); CL scored 0.94787 (ASSD: 1.36131); LL ranged from 0.93179-0.94026 (ASSD: 1.51910-1.77350). Faulty-client exclusion: FL reached Dice at 0.94790 (ASSD: 1.33113) better than CL's 0.94550 (ASSD: 1.39318). Loss-curve monitoring reliably flagged the corrupted site. Conclusions: FL matches or exceeds CL and outperforms LL across corruption scenarios while preserving privacy. Per-client loss trajectories provide an effective anomaly-detection mechanism and support FL as a practical, privacy-preserving approach for scalable clinical AI deployment.