🤖 AI Summary
In federated learning (FL), scaling up the number of clients degrades per-round update efficacy, slows convergence, and increases communication overhead. To address this, we propose Cohort-Parallel FL—a novel paradigm that partitions clients into multiple disjoint cohorts; each cohort trains its model locally to convergence independently, and models are fused via a single round of cross-cohort, unlabeled knowledge distillation. This approach is the first to empirically uncover and leverage the principle that isolated small-scale networks accelerate convergence—thereby breaking the conventional constraint of global synchronous training. Evaluated on CIFAR-10 under a non-IID setting with four cohorts, our method reduces total training time by 1.9× and cuts computational and communication costs by 1.3×, while incurring only a marginal drop in test accuracy. The result is a balanced optimization across convergence speed, resource efficiency, and model performance.
📝 Abstract
Federated Learning (FL) is a machine learning approach where nodes collaboratively train a global model. As more nodes participate in a round of FL, the effectiveness of individual model updates by nodes also diminishes. In this study, we increase the effectiveness of client updates by dividing the network into smaller partitions, or cohorts. We introduce Cohort-Parallel Federated Learning (CPFL): a novel learning approach where each cohort independently trains a global model using FL, until convergence, and the produced models by each cohort are then unified using one-shot Knowledge Distillation (KD) and a cross-domain, unlabeled dataset. The insight behind CPFL is that smaller, isolated networks converge quicker than in a one-network setting where all nodes participate. Through exhaustive experiments involving realistic traces and non-IID data distributions on the CIFAR-10 and FEMNIST image classification tasks, we investigate the balance between the number of cohorts, model accuracy, training time, and compute and communication resources. Compared to traditional FL, CPFL with four cohorts, non-IID data distribution, and CIFAR-10 yields a 1.9$ imes$ reduction in train time and a 1.3$ imes$ reduction in resource usage, with a minimal drop in test accuracy.