Knowledge Distillation for Federated Learning: a Practical Guide

📅 2022-11-09
🏛️ International Joint Conference on Artificial Intelligence
📈 Citations: 15
Influential: 0
📄 PDF
🤖 AI Summary
Federated Averaging (FedAvg) suffers from inherent limitations under heterogeneous data, including model homogenization, high communication overhead, and slow convergence. Method: This paper proposes a systematic optimization framework—Knowledge Distillation for Federated Learning (KD-FL)—addressing these challenges. We introduce the first taxonomy of KD-FL methods, rigorously characterizing applicability boundaries, trade-offs, and intrinsic limitations of each paradigm. Our framework features a teacher-student collaborative architecture, soft-label distillation, and cross-client knowledge alignment, jointly optimizing model heterogeneity and communication efficiency. Contribution/Results: Extensive experiments on Non-IID benchmarks demonstrate significant improvements in model generalization and convergence speed. The proposed approach reduces required communication rounds by over 30% compared to baseline FedAvg, while providing a reproducible, principled guideline for KD-FL algorithm selection and deployment.
📝 Abstract
Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits, i.e., model homogeneity, high communication cost, poor performance in presence of heterogeneous data distributions. Federated adaptations of regular Knowledge Distillation (KD) can solve or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we originally present a focused review of the state-of-the-art KD-based algorithms specifically tailored for FL, by providing both a novel classification of the existing approaches and a detailed technical description of their pros, cons, and tradeoffs.
Problem

Research questions and friction points this paper is trying to address.

Addressing model homogeneity in Federated Learning
Reducing communication costs in distributed model training
Improving performance on heterogeneous data distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning with Knowledge Distillation
Mitigates parameter-averaging FL limitations
Classifies KD-based FL approaches
🔎 Similar Papers
No similar papers found.
Alessio Mora
Alessio Mora
Assistant Professor, University of Bologna
federated learningedge computingIoT
I
Irene Tenison
Massachusetts Institute of Technology (MIT)
P
P. Bellavista
University of Bologna
I
I. Rish
Mila, University of Montreal