Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data

📅 2018-10-19

🏛️ IEEE International Symposium on Biomedical Imaging

📈 Citations: 167

✨ Influential: 3

career value

239K/year

🤖 AI Summary

Multi-center neuroimaging data remain fragmented due to privacy regulations and compliance constraints, hindering integrative analysis of subcortical structures. Method: We propose the first federated learning framework for subcortical meta-analysis, leveraging federated averaging and distributed gradient aggregation to enable cross-institutional modeling across heterogeneous cohorts—including ADNI and UK Biobank—without sharing raw individual-level data. To ensure statistical validity and privacy preservation, we integrate synthetic data validation and multi-source image standardization. Contribution/Results: Experiments on four real-world neuroimaging cohorts and synthetic data demonstrate that our approach effectively overcomes data silos. It is the first to systematically identify shared subcortical morphological abnormalities in Alzheimer’s and Parkinson’s diseases. This work establishes a scalable, regulatory-compliant, and statistically robust paradigm for distributed studies of genotype–brain structure–phenotype associations.

📝 Abstract

At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts.

Problem

Research questions and friction points this paper is trying to address.

Privacy Protection

Medical Data Sharing

Neurogenetic Research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Privacy Protection

Brain Imaging Data

🔎 Similar Papers

From Challenges and Pitfalls to Recommendations and Opportunities: Implementing Federated Learning in Healthcare