Impact of Labeling Inaccuracy and Image Noise on Tooth Segmentation in Panoramic Radiographs using Federated, Centralized and Local Learning

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses tooth segmentation in multi-center dental panoramic radiographs under federated learning (FL), focusing on robustness to data-quality heterogeneity—including label noise and image degradation—while comparing FL against centralized and local learning. Method: We implement FL using an Attention U-Net architecture atop the Flower framework and propose a novel client-level anomaly detection mechanism based on loss-curve dynamics to identify participants with corrupted or low-quality data. Non-parametric statistical significance of segmentation performance is rigorously assessed via Wilcoxon signed-rank tests on Dice, IoU, Hausdorff Distance at 95th percentile (HD95), and Average Symmetric Surface Distance (ASSD). Contribution/Results: FL achieves optimal or near-optimal performance across all settings—including baseline and degraded-data scenarios—with median Dice = 0.9489 and ASSD = 1.31—significantly outperforming local learning (p < 0.01) while preserving data privacy and enabling clinically feasible deployment.

Technology Category

Application Category

📝 Abstract
Objectives: Federated learning (FL) may mitigate privacy constraints, heterogeneous data quality, and inconsistent labeling in dental diagnostic AI. We compared FL with centralized (CL) and local learning (LL) for tooth segmentation in panoramic radiographs across multiple data corruption scenarios. Methods: An Attention U-Net was trained on 2066 radiographs from six institutions across four settings: baseline (unaltered data); label manipulation (dilated/missing annotations); image-quality manipulation (additive Gaussian noise); and exclusion of a faulty client with corrupted data. FL was implemented via the Flower AI framework. Per-client training- and validation-loss trajectories were monitored for anomaly detection and a set of metrics (Dice, IoU, HD, HD95 and ASSD) was evaluated on a hold-out test set. From these metrics significance results were reported through Wilcoxon signed-rank test. CL and LL served as comparators. Results: Baseline: FL achieved a median Dice of 0.94889 (ASSD: 1.33229), slightly better than CL at 0.94706 (ASSD: 1.37074) and LL at 0.93557-0.94026 (ASSD: 1.51910-1.69777). Label manipulation: FL maintained the best median Dice score at 0.94884 (ASSD: 1.46487) versus CL's 0.94183 (ASSD: 1.75738) and LL's 0.93003-0.94026 (ASSD: 1.51910-2.11462). Image noise: FL led with Dice at 0.94853 (ASSD: 1.31088); CL scored 0.94787 (ASSD: 1.36131); LL ranged from 0.93179-0.94026 (ASSD: 1.51910-1.77350). Faulty-client exclusion: FL reached Dice at 0.94790 (ASSD: 1.33113) better than CL's 0.94550 (ASSD: 1.39318). Loss-curve monitoring reliably flagged the corrupted site. Conclusions: FL matches or exceeds CL and outperforms LL across corruption scenarios while preserving privacy. Per-client loss trajectories provide an effective anomaly-detection mechanism and support FL as a practical, privacy-preserving approach for scalable clinical AI deployment.
Problem

Research questions and friction points this paper is trying to address.

Evaluating federated learning for tooth segmentation in dental radiographs
Assessing impact of labeling inaccuracies and image noise on segmentation
Comparing federated, centralized, and local learning approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning with Attention U-Net architecture
Flower AI framework for distributed training
Loss-curve monitoring for anomaly detection
🔎 Similar Papers
No similar papers found.