🤖 AI Summary
This paper addresses the inferior generalization performance of federated learning (FL) compared to centralized learning. We systematically characterize its two-level generalization error: the out-of-sample gap for participating clients and the risk discrepancy across non-participating clients. To this end, we propose, for the first time, a “super-client” conditional mutual information (CMI) framework—extending conventional single-level CMI to capture FL’s hierarchical structure. We further develop a privacy-constrained generalization guarantee mechanism and derive a computable fast-rate upper bound on the super-client CMI, achieving optimal convergence rates that improve with increasing numbers of clients. Our theoretical bounds are tight, empirically evaluable, and match state-of-the-art convergence rates. Extensive experiments confirm that the bound accurately captures the generalization behavior of mainstream FL algorithms. The core innovations lie in the construction of the super-client CMI and the novel privacy–generalization co-analysis paradigm.
📝 Abstract
Federated Learning (FL) is a widely adopted privacy-preserving distributed learning framework, yet its generalization performance remains less explored compared to centralized learning. In FL, the generalization error consists of two components: the out-of-sample gap, which measures the gap between the empirical and true risk for participating clients, and the participation gap, which quantifies the risk difference between participating and non-participating clients. In this work, we apply an information-theoretic analysis via the conditional mutual information (CMI) framework to study FL's two-level generalization. Beyond the traditional supersample-based CMI framework, we introduce a superclient construction to accommodate the two-level generalization setting in FL. We derive multiple CMI-based bounds, including hypothesis-based CMI bounds, illustrating how privacy constraints in FL can imply generalization guarantees. Furthermore, we propose fast-rate evaluated CMI bounds that recover the best-known convergence rate for two-level FL generalization in the small empirical risk regime. For specific FL model aggregation strategies and structured loss functions, we refine our bounds to achieve improved convergence rates with respect to the number of participating clients. Empirical evaluations confirm that our evaluated CMI bounds are non-vacuous and accurately capture the generalization behavior of FL algorithms.