🤖 AI Summary
Rare diseases like collagen VI-related dystrophies (COL6-RD) suffer from scarce, fragmented, and privacy-restricted data, impeding development of robust machine learning–based diagnostic models.
Method: We propose the first cross-institutional federated learning framework for COL6-RD diagnosis, leveraging multi-center, distributed immunofluorescence images of dermal fibroblasts. A convolutional neural network is collaboratively trained without raw data leaving local sites, enabling precise subtyping of three pathogenic mechanisms: exon skipping, glycine substitutions, and pseudoexon insertions.
Contribution/Results: This work pioneers federated learning for mechanism-level pathological diagnosis in COL6-RD, overcoming data silos and privacy constraints. The framework achieves an F1-score of 0.82—significantly surpassing single-center baselines (0.57–0.75). It further supports clinical interpretation of variants of uncertain significance (VUS) and prioritizes novel pathogenic variant screening, demonstrating strong generalizability and interpretability.
📝 Abstract
The application of Machine Learning (ML) to the diagnosis of rare diseases, such as collagen VI-related dystrophies (COL6-RD), is fundamentally limited by the scarcity and fragmentation of available data. Attempts to expand sampling across hospitals, institutions, or countries with differing regulations face severe privacy, regulatory, and logistical obstacles that are often difficult to overcome. The Federated Learning (FL) provides a promising solution by enabling collaborative model training across decentralized datasets while keeping patient data local and private. Here, we report a novel global FL initiative using the Sherpa.ai FL platform, which leverages FL across distributed datasets in two international organizations for the diagnosis of COL6-RD, using collagen VI immunofluorescence microscopy images from patient-derived fibroblast cultures. Our solution resulted in an ML model capable of classifying collagen VI patient images into the three primary pathogenic mechanism groups associated with COL6-RD: exon skipping, glycine substitution, and pseudoexon insertion. This new approach achieved an F1-score of 0.82, outperforming single-organization models (0.57-0.75). These results demonstrate that FL substantially improves diagnostic utility and generalizability compared to isolated institutional models. Beyond enabling more accurate diagnosis, we anticipate that this approach will support the interpretation of variants of uncertain significance and guide the prioritization of sequencing strategies to identify novel pathogenic variants.