🤖 AI Summary
In federated learning, highly heterogeneous and scarce client data impede effective personalized modeling. Method: We propose PAC-PFL, a personalized federated learning algorithm grounded in the PAC-Bayesian framework that jointly learns a shared hyper-posterior distribution, enabling Bayesian personalized posterior inference per client. PAC-PFL is the first to integrate PAC-Bayesian generalization bounds with differential privacy to handle data-dependent priors, unifies global collaboration and local adaptation via hyper-posterior modeling, and combines variational Bayesian inference, federated hyperparameter learning, and Dirichlet-based non-IID data partitioning. Results: Experiments on photovoltaic power forecasting, FEMNIST, and Dirichlet-EMNIST demonstrate that PAC-PFL reduces average prediction error by 12.7% over baselines and improves expected calibration error (ECE) by over 40%, significantly enhancing generalization and predictive calibration—especially for clients with limited data.
📝 Abstract
Federated learning aims to infer a shared model from private and decentralized data stored locally by multiple clients. Personalized federated learning (PFL) goes one step further by adapting the global model to each client, enhancing the model's fit for different clients. A significant level of personalization is required for highly heterogeneous clients, but can be challenging to achieve especially when they have small datasets. To address this problem, we propose a PFL algorithm named PAC-PFL for learning probabilistic models within a PAC-Bayesian framework that utilizes differential privacy to handle data-dependent priors. Our algorithm collaboratively learns a shared hyper-posterior and regards each client's posterior inference as the personalization step. By establishing and minimizing a generalization bound on the average true risk of clients, PAC-PFL effectively combats over-fitting. PACPFL achieves accurate and well-calibrated predictions, supported by experiments on a dataset of photovoltaic panel power generation, FEMNIST dataset (Caldas et al., 2019), and Dirichlet-partitioned EMNIST dataset (Cohen et al., 2017).