🤖 AI Summary
This work addresses online federated classification under multi-client streaming data. We propose the first renewable estimation framework integrating Generalized Distance-Weighted Discrimination (GDWD) with Majorization-Minimization (MM) optimization, enabling privacy-preserving real-time model updates under both heterogeneous and homogeneous data distributions. Theoretically, we establish consistency, asymptotic normality, and Bayesian risk consistency of the estimator; additionally, we incorporate differential privacy to ensure secure inter-client communication. Empirically, the framework achieves significant improvements in classification accuracy on both synthetic and real-world streaming datasets, while reducing computational overhead and storage requirements—eliminating the need for global model retraining from scratch. The method thus delivers superior efficiency, robustness, and regulatory compliance with privacy standards.
📝 Abstract
In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized distance-weighted discriminant technique, making it robust to both homogeneous and heterogeneous data distributions across clients. In particular, we develop a new optimization algorithm based on the Majorization-Minimization principle, integrated with a renewable estimation procedure, enabling efficient model updates without full retraining. We provide a theoretical guarantee for the convergence of our estimator, proving its consistency and asymptotic normality under standard regularity conditions. In addition, we establish that our method achieves Bayesian risk consistency, ensuring its reliability for classification tasks in federated environments. We further incorporate differential privacy mechanisms to enhance data security, protecting client information while maintaining model performance. Extensive numerical experiments on both simulated and real-world datasets demonstrate that our approach delivers high classification accuracy, significant computational efficiency gains, and substantial savings in data storage requirements compared to existing methods.