FedeKD: Energy-Based Gating for Robust Federated Knowledge Distillation under Heterogeneous Settings

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of negative transfer in heterogeneous federated learning, where statistical heterogeneity and model asymmetry undermine performance, and existing federated knowledge distillation methods suffer from limited robustness due to reliance on public data or assumptions of universal knowledge applicability. To overcome these limitations, we propose FedeKD, a novel framework that explicitly incorporates sample-level knowledge credibility into the federated distillation process. In FedeKD, each client jointly trains a private large model and a lightweight shared proxy model, while the server aggregates global proxy models to guide local updates. Crucially, we introduce a public-data-free energy gating mechanism that dynamically evaluates and weights the reliability of sample-level knowledge for transfer. Extensive experiments across six real-world datasets demonstrate that FedeKD significantly mitigates negative transfer and maintains superior performance even under extreme data heterogeneity.

📝 Abstract

Federated learning (FL) operates in heterogeneous environments, where variations in data distributions and asymmetric model design often result in negative transfer. While federated knowledge distillation (FKD) avoids direct model parameter sharing, existing methods typically rely on public datasets or assume that transferred knowledge is uniformly reliable, which limits their robustness in practice. This paper presents FedeKD, a reliability-aware FKD framework that makes sample-wise trust estimation an explicit component of knowledge transfer, without relying on additional public data. Each client maintains a high-capacity private model for local learning and a lightweight shared proxy model for cross-client knowledge exchange. During training, proxy models are aggregated on the server to form a global proxy, which is then used to guide updates of the private models. At the core of FedeKD is an energy-based gating mechanism that converts task-specific private-proxy disagreement into sample-wise trust weights for backward distillation. This mechanism enables sample-wise weighting of knowledge transfer, where the proxy model contributes more to reliable samples while down-weighting unreliable ones. Extensive experiments on six real-world datasets demonstrate that FedeKD significantly reduces negative transfer under heterogeneous settings while maintaining strong predictive performance.

Problem

Research questions and friction points this paper is trying to address.

Federated Learning

Heterogeneous Settings

Negative Transfer

Knowledge Distillation

Reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

energy-based gating

federated knowledge distillation

sample-wise trust estimation