🤖 AI Summary
In large-scale federated learning, partial client participation exacerbates data heterogeneity—such as label and quantity skew—leading to degraded model performance. To address this, we propose KDIA, a framework synergizing knowledge distillation with imbalanced aggregation. Its core contributions are: (1) a weighted teacher-model aggregation mechanism that incorporates client participation frequency, count, and local dataset size; (2) a server-side generator that synthesizes near-IID features to facilitate robust teacher–student knowledge transfer; and (3) a unified training objective integrating knowledge distillation, self-distillation, and GAN-based feature generation to enhance generalization under heterogeneity. Extensive experiments on CIFAR-10, CIFAR-100, and CINIC-10 demonstrate that KDIA achieves significantly higher accuracy than baselines under low participation rates and strong data heterogeneity, requiring fewer communication rounds. Notably, performance gains increase with the degree of heterogeneity, confirming KDIA’s effectiveness in challenging real-world FL settings.
📝 Abstract
Federated learning aims to train a global model in a distributed environment that is close to the performance of centralized training. However, issues such as client label skew, data quantity skew, and other heterogeneity problems severely degrade the model's performance. Most existing methods overlook the scenario where only a small portion of clients participate in training within a large-scale client setting, whereas our experiments show that this scenario presents a more challenging federated learning task. Therefore, we propose a Knowledge Distillation with teacher-student Inequitable Aggregation (KDIA) strategy tailored to address the federated learning setting mentioned above, which can effectively leverage knowledge from all clients. In KDIA, the student model is the average aggregation of the participating clients, while the teacher model is formed by a weighted aggregation of all clients based on three frequencies: participation intervals, participation counts, and data volume proportions. During local training, self-knowledge distillation is performed. Additionally, we utilize a generator trained on the server to generate approximately independent and identically distributed (IID) data features locally for auxiliary training. We conduct extensive experiments on the CIFAR-10/100/CINIC-10 datasets and various heterogeneous settings to evaluate KDIA. The results show that KDIA can achieve better accuracy with fewer rounds of training, and the improvement is more significant under severe heterogeneity.