Learning Critically: Selective Self-Distillation in Federated Learning on Non-IID Data

📅 2024-12-01
🏛️ IEEE Transactions on Big Data
📈 Citations: 20
Influential: 3
📄 PDF
🤖 AI Summary
To address the challenges of poor generalization, slow convergence, and local model divergence from the global optimum in federated learning caused by non-IID data, this paper proposes a selective self-distillation framework. The method introduces a dual-level credibility assessment—operating at both class- and sample-level—to dynamically generate fine-grained self-distillation weights, enabling adaptive integration of global knowledge into local training. Crucially, it requires no auxiliary teacher model or additional communication overhead, while providing theoretical guarantees on convergence. Extensive experiments on three standard non-IID benchmark datasets demonstrate that the proposed approach significantly improves model generalization and robustness, outperforming existing state-of-the-art methods with fewer communication rounds.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) enables multiple clients to collaboratively train a global model while keeping local data decentralized. Data heterogeneity (non-IID) across clients has imposed significant challenges to FL, which makes local models re-optimize towards their own local optima and forget the global knowledge, resulting in performance degradation and convergence slowdown. Many existing works have attempted to address the non-IID issue by adding an extra global-model-based regularizing item to the local training but without an adaption scheme, which is not efficient enough to achieve high performance with deep learning models. In this paper, we propose a Selective Self-Distillation method for Federated learning (FedSSD), which imposes adaptive constraints on the local updates by self-distilling the global model's knowledge and selectively weighting it by evaluating the credibility at both the class and sample level. The convergence guarantee of FedSSD is theoretically analyzed and extensive experiments are conducted on three public benchmark datasets, which demonstrates that FedSSD achieves better generalization and robustness in fewer communication rounds, compared with other state-of-the-art FL methods.
Problem

Research questions and friction points this paper is trying to address.

Addressing performance degradation in federated learning due to non-IID data
Improving global model convergence in decentralized training environments
Enhancing generalization and robustness with selective self-distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective self-distillation for adaptive constraints
Class and sample level credibility weighting
Federated learning on non-IID data
🔎 Similar Papers
No similar papers found.
Yuting He
Yuting He
Foundation Medicine Inc.
Precision MedicineBiomarker and CDxCancer GenomicsMachine LearningData Mining
Y
Yiqiang Chen
Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100190
X
Xiaodong Yang
Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100190
Hanchao Yu
Hanchao Yu
AI at Meta
Multimodal UnderstandingComputer VisionDeep LearningMedical Image Analysis
Yi-Hua Huang
Yi-Hua Huang
The University of Hong Kong
Dynamic ReconstructionGeometry ProcessingComputer Graphics3D Computer Vision
Y
Yang Gu
Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100190