Federated Variational Inference for Bayesian Mixture Models

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address privacy-sensitive clustering of large-scale binary and categorical data under federated learning, this paper proposes the first variational inference-based federated Bayesian mixture modeling framework. Each client performs local variational inference and uploads only lightweight sufficient statistics—never raw data—thereby ensuring strict data locality. Model structure discovery is achieved via intra-batch “merge-and-drop” and inter-batch “global merge” strategies, preserving global statistical consistency. The framework provides formal privacy guarantees without compromising model expressiveness or clustering fidelity. Extensive experiments on synthetic data, benchmark datasets, and real-world large-scale electronic health records (EHR) demonstrate that our method significantly outperforms state-of-the-art federated and centralized clustering algorithms in accuracy, while exhibiting superior scalability and robust privacy protection.

Technology Category

Application Category

📝 Abstract
We present a federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets. We introduce a principled 'divide and conquer' inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by 'global' merge moves across batches to find global clustering structures. We show that these merge moves require only summaries of the data in each batch, enabling federated learning across local nodes without requiring the full dataset to be shared. Empirical results on simulated and benchmark datasets demonstrate that our method performs well in comparison to existing clustering algorithms. We validate the practical utility of the method by applying it to large scale electronic health record (EHR) data.
Problem

Research questions and friction points this paper is trying to address.

Federated learning for Bayesian clustering
Scalable inference for large datasets
Privacy-preserving global clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning for Bayesian models
Variational inference with local moves
Global merge without full data sharing
🔎 Similar Papers
No similar papers found.
J
Jackie Rao
MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
F
Francesca L. Crowe
Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom
T
Tom Marshall
Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom
Sylvia Richardson
Sylvia Richardson
Director of MRC Biostatistics Unit and Professor of Biostatistics, University of Cambridge
Statistical genomicshigh-dimensional databioinformatics
P
Paul D. W. Kirk
MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom