Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning

πŸ“… 2023-05-31
πŸ›οΈ Trans. Mach. Learn. Res.
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address client drift caused by non-IID data in heterogeneous federated learning, this paper proposes an adaptive self-distillation regularization method. The approach innovatively jointly models the global model’s output entropy and the local label distribution to dynamically calibrate client update directions, thereby suppressing deviation from the global optimum. We provide theoretical analysis proving that the method effectively mitigates drift and improves the generalization bound. Importantly, it is fully compatible with mainstream frameworks (e.g., FedAvg, FedProx) and incurs no additional communication overhead. Extensive experiments on multiple real-world non-IID benchmark datasets demonstrate that our method consistently improves global model accuracy by 3.2–5.8 percentage points on average, accelerates convergence significantly, and outperforms existing state-of-the-art approaches across all evaluated metrics.
πŸ“ Abstract
Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across the local data distributions observed by each of these clients. Under such non-iid data distributions across clients, FL suffers from the 'client-drift' problem where every client drifts to its own local optimum. This results in slower convergence and poor performance of the aggregated model. To address this limitation, we propose a novel regularization technique based on adaptive self-distillation (ASD) for training models on the client side. Our regularization scheme adaptively adjusts to the client's training data based on the global model entropy and the client's label distribution. The proposed regularization can be easily integrated atop existing, state-of-the-art FL algorithms, leading to a further boost in the performance of these off-the-shelf methods. We theoretically explain how ASD reduces client-drift and also explain its generalization ability. We demonstrate the efficacy of our approach through extensive experiments on multiple real-world benchmarks and show substantial gains in performance over state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses client drift in federated learning due to data heterogeneity
Proposes adaptive self-distillation to improve convergence and model performance
Enhances existing FL methods with a regularization technique for non-iid data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive self-distillation regularizes client training
Adjusts regularization based on entropy and label distribution
Integrates with existing federated learning algorithms
πŸ”Ž Similar Papers
No similar papers found.
M
M.Yashwanth
Indian Institute of Science
Gaurav Kumar Nayak
Gaurav Kumar Nayak
Assistant Professor, IIT Roorkee
Machine LearningDeep Learning for Computer VisionData-efficient deep learningGenerative AI
A
Aryaveer Singh
Indian Institute of Science
Yogesh Singh
Yogesh Singh
Indian Institute of Science
A
Anirban Chakraborty
Indian Institute of Science