๐ค AI Summary
Federated training of generalized linear models (GLMs) faces challenges including data privacy leakage, low statistical efficiency, and poor cross-institutional scalability. Method: We propose the first privacy-preserving distributed GLM training framework supporting streaming data and heterogeneous data distributions. Our approach extends privacy-preserving linear regression to the entire GLM family by integrating secure multi-party computation (SMPC) with iterative reweighted least squares (IRLS), while incorporating either differential privacy (DP) or secure aggregation at the sufficient statistic or gradient level. Contribution/Results: The framework provides rigorous ฮต-differential privacy guarantees. Empirical evaluation demonstrates that its estimation accuracy matches centralized maximum likelihood estimation (MLE), while reducing communication overhead by 40%. This significantly enhances the practicality, security, and generalizability of federated GLM modeling.
๐ Abstract
This paper presents a novel approach to classical linear regression, enabling model computation from data streams or in a distributed setting while preserving data privacy in federated environments. We extend this framework to generalized linear models (GLMs), ensuring scalability and adaptability to diverse data distributions while maintaining privacy-preserving properties. To assess the effectiveness of our approach, we conduct numerical studies on both simulated and real datasets, comparing our method with conventional maximum likelihood estimation for GLMs using iteratively reweighted least squares. Our results demonstrate the advantages of the proposed method in distributed and federated settings.