SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address severe catastrophic forgetting in sequential federated learning (SFL) under data heterogeneity, this paper proposes a Difference-Aware Multi-Teacher Knowledge Distillation (DA-MTKD) framework. DA-MTKD introduces a fine-grained, class-distribution-difference-based weighting mechanism to dynamically prioritize knowledge contributions from target versus non-target classes. It formulates teacher selection as a maximum coverage problem and devises a complementarity-driven greedy strategy to prevent knowledge dilution. Furthermore, it extends knowledge distillation in a decoupled manner to enable efficient fusion of heterogeneous knowledge from multiple teachers. Extensive experiments on diverse heterogeneous benchmarks demonstrate that DA-MTKD significantly outperforms existing SFL methods, effectively mitigating forgetting while enhancing global model generalization, stability, and convergence speed.

Technology Category

Application Category

📝 Abstract
Federated Learning (FL) is a distributed machine learning paradigm which coordinates multiple clients to collaboratively train a global model via a central server. Sequential Federated Learning (SFL) is a newly-emerging FL training framework where the global model is trained in a sequential manner across clients. Since SFL can provide strong convergence guarantees under data heterogeneity, it has attracted significant research attention in recent years. However, experiments show that SFL suffers from severe catastrophic forgetting in heterogeneous environments, meaning that the model tends to forget knowledge learned from previous clients. To address this issue, we propose an SFL framework with discrepancy-aware multi-teacher knowledge distillation, called SFedKD, which selects multiple models from the previous round to guide the current round of training. In SFedKD, we extend the single-teacher Decoupled Knowledge Distillation approach to our multi-teacher setting and assign distinct weights to teachers' target-class and non-target-class knowledge based on the class distributional discrepancy between teacher and student data. Through this fine-grained weighting strategy, SFedKD can enhance model training efficacy while mitigating catastrophic forgetting. Additionally, to prevent knowledge dilution, we eliminate redundant teachers for the knowledge distillation and formalize it as a variant of the maximum coverage problem. Based on the greedy strategy, we design a complementary-based teacher selection mechanism to ensure that the selected teachers achieve comprehensive knowledge space coverage while reducing communication and computational costs. Extensive experiments show that SFedKD effectively overcomes catastrophic forgetting in SFL and outperforms state-of-the-art FL methods.
Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in Sequential Federated Learning
Enhances training via multi-teacher knowledge distillation
Reduces redundancy in teacher models for efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-teacher knowledge distillation for SFL
Discrepancy-aware weighting for class knowledge
Greedy-based teacher selection to reduce redundancy
🔎 Similar Papers
No similar papers found.
H
Haotian Xu
University of Science and Technology of China
J
Jinrui Zhou
University of Science and Technology of China
X
Xichong Zhang
University of Science and Technology of China
Mingjun Xiao
Mingjun Xiao
University of Science and Technology of China
Mobile ComputingCrowdsensingMobile Social NetworkVechular Network
H
He Sun
University of Science and Technology of China
Yin Xu
Yin Xu
Beijing Jiaotong University
Power Grid ResilienceElectricity-Transportation Integrated SystemPower System High-Performance Simulation