Provably Near-Optimal Federated Ensemble Distillation with Negligible Overhead

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

In federated learning, high client heterogeneity severely degrades the quality of server-side pseudo-labels. Method: We propose a GAN-inspired discriminator-cooperation framework: lightweight discriminators are trained locally on clients, while generators are deployed distributively on the server; pseudo-label consistency is jointly optimized. We further introduce the first provably near-optimal client weighting mechanism to achieve high-quality pseudo-label aggregation. Integrating federated learning, knowledge distillation, and generative modeling, our approach improves global model performance without requiring server-side labeled data. Results: Our method consistently outperforms state-of-the-art baselines across multiple image classification benchmarks. It incurs negligible additional communication overhead (<0.5%), ensures controllable privacy leakage, and imposes minimal extra computational cost on clients (<3%).

Technology Category

Application Category

📝 Abstract

Federated ensemble distillation addresses client heterogeneity by generating pseudo-labels for an unlabeled server dataset based on client predictions and training the server model using the pseudo-labeled dataset. The unlabeled server dataset can either be pre-existing or generated through a data-free approach. The effectiveness of this approach critically depends on the method of assigning weights to client predictions when creating pseudo-labels, especially in highly heterogeneous settings. Inspired by theoretical results from GANs, we propose a provably near-optimal weighting method that leverages client discriminators trained with a server-distributed generator and local datasets. Our experiments on various image classification tasks demonstrate that the proposed method significantly outperforms baselines. Furthermore, we show that the additional communication cost, client-side privacy leakage, and client-side computational overhead introduced by our method are negligible, both in scenarios with and without a pre-existing server dataset.

Problem

Research questions and friction points this paper is trying to address.

Optimizes federated ensemble distillation for client heterogeneity

Proposes near-optimal weighting method for pseudo-label generation

Minimizes communication cost and client-side overhead in federated learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated ensemble distillation

Provably near-optimal weighting

Negligible overhead and leakage

🔎 Similar Papers

FedDTG: Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks