Training Together, Diagnosing Better: Federated Learning for Collagen VI-Related Dystrophies

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Rare diseases like collagen VI-related dystrophies (COL6-RD) suffer from scarce, fragmented, and privacy-restricted data, impeding development of robust machine learning–based diagnostic models. Method: We propose the first cross-institutional federated learning framework for COL6-RD diagnosis, leveraging multi-center, distributed immunofluorescence images of dermal fibroblasts. A convolutional neural network is collaboratively trained without raw data leaving local sites, enabling precise subtyping of three pathogenic mechanisms: exon skipping, glycine substitutions, and pseudoexon insertions. Contribution/Results: This work pioneers federated learning for mechanism-level pathological diagnosis in COL6-RD, overcoming data silos and privacy constraints. The framework achieves an F1-score of 0.82—significantly surpassing single-center baselines (0.57–0.75). It further supports clinical interpretation of variants of uncertain significance (VUS) and prioritizes novel pathogenic variant screening, demonstrating strong generalizability and interpretability.

Technology Category

Application Category

📝 Abstract
The application of Machine Learning (ML) to the diagnosis of rare diseases, such as collagen VI-related dystrophies (COL6-RD), is fundamentally limited by the scarcity and fragmentation of available data. Attempts to expand sampling across hospitals, institutions, or countries with differing regulations face severe privacy, regulatory, and logistical obstacles that are often difficult to overcome. The Federated Learning (FL) provides a promising solution by enabling collaborative model training across decentralized datasets while keeping patient data local and private. Here, we report a novel global FL initiative using the Sherpa.ai FL platform, which leverages FL across distributed datasets in two international organizations for the diagnosis of COL6-RD, using collagen VI immunofluorescence microscopy images from patient-derived fibroblast cultures. Our solution resulted in an ML model capable of classifying collagen VI patient images into the three primary pathogenic mechanism groups associated with COL6-RD: exon skipping, glycine substitution, and pseudoexon insertion. This new approach achieved an F1-score of 0.82, outperforming single-organization models (0.57-0.75). These results demonstrate that FL substantially improves diagnostic utility and generalizability compared to isolated institutional models. Beyond enabling more accurate diagnosis, we anticipate that this approach will support the interpretation of variants of uncertain significance and guide the prioritization of sequencing strategies to identify novel pathogenic variants.
Problem

Research questions and friction points this paper is trying to address.

Diagnosing rare collagen VI dystrophies with fragmented patient data across institutions
Overcoming privacy and regulatory barriers in multi-center medical data sharing
Improving diagnostic accuracy for genetic variants using decentralized collaborative learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning enables collaborative training across decentralized datasets
Sherpa.ai platform leverages distributed data from international organizations
Model classifies collagen VI images into three pathogenic mechanism groups
🔎 Similar Papers
No similar papers found.
A
Astrid Brull
Neurogenetics and Neuromuscular Disorders of Childhood Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
S
Sara Aguti
Neurodegenerative Disease Department, UCL Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
V
Véronique Bolduc
Neurogenetics and Neuromuscular Disorders of Childhood Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
Ying Hu
Ying Hu
Professor of Mathematics, Université Rennes
stochastic analysiscontrol and optimizationmathematical finance
D
Daniel M. Jimenez-Gutierrez
Sherpa.ai, Erandio, Bizkaia 48950, Spain
Enrique Zuazua
Enrique Zuazua
FAU Erlangen/Humboldt Prof. & Deusto-Bilbao & UAM-Madrid & Sherpa.ai Chief Algorithm Scientist
Applied MathematicsPartial Differential EquationsNumericsControlMachine Learning
J
Joaquin Del-Rio
Sherpa.ai, Erandio, Bizkaia 48950, Spain
Oleksii Sliusarenko
Oleksii Sliusarenko
Sherpa.ai, Erandio, Bizkaia 48950, Spain
H
Haiyan Zhou
Genetics and Genomic Medicine Research and Teaching Department, Great Ormond Street Institute of Child Health, University College London, London WC1N 1EH, UK
F
F. Muntoni
Neurodegenerative Disease Department, UCL Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
C
Carsten G. Bonnemann
Neurogenetics and Neuromuscular Disorders of Childhood Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
X
Xabi Uribe-Etxebarria
Sherpa.ai, Erandio, Bizkaia 48950, Spain