🤖 AI Summary
To address the challenge of personalized modeling under client distribution heterogeneity in federated learning, this paper proposes an automated group partitioning method that requires no pre-specified clustering timing or assumptions. The method constructs a dynamic similarity metric based on cosine distances among client gradients and incorporates a temperature mechanism to detect model convergence in real time, enabling adaptive clustering triggering. It further integrates density-based clustering with GradCAM-based interpretability analysis to achieve robust client grouping. We introduce, for the first time, a “clustering-transparent” single-clustering framework that eliminates manual specification of clustering rounds or hyperparameters. Extensive evaluation across five benchmark datasets and over 40 tasks demonstrates significant improvements in both clustering timeliness and personalized model performance. Empirical results confirm that gradient-space density clustering exhibits strong discriminative power for characterizing differences in loss surface geometry.
📝 Abstract
Federated Learning (FL) is a widespread and well-adopted paradigm of decentralised learning that allows training one model from multiple sources without the need to transfer data between participating clients directly. Since its inception in 2015, it has been divided into numerous subfields that deal with application-specific issues, such as data heterogeneity or resource allocation. One such sub-field, Clustered Federated Learning (CFL), deals with the problem of clustering the population of clients into separate cohorts to deliver personalised models. Although a few remarkable works have been published in this domain, the problem remains largely unexplored, as its basic assumptions and settings differ slightly from those of standard FL. In this work, we present One-Shot Clustered Federated Learning (OCFL), a clustering-agnostic algorithm that can automatically detect the earliest suitable moment for clustering. Our algorithm is based on computing the cosine distance between the gradients of the clients and a temperature measure that detects when the federated model starts to converge. We empirically evaluate our methodology by testing various one-shot clustering algorithms for over forty different tasks on five benchmark datasets. Our experiments showcase the good performance of our approach when used to perform CFL in an automated manner without the need to adjust hyperparameters. We also revisit the practical feasibility of CFL algorithms based on the gradients of the clients, providing firm evidence of the high efficiency of density-based clustering methods when used to differentiate between the loss surfaces of neural networks trained on different distributions. Moreover, by inspecting the feasibility of local explanations generated with the help of GradCAM, we can provide more insights into the relationship between personalisation and the explainability of local predictions.