Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
This work addresses the performance and resource imbalances in federated learning caused by non-IID data distributions, device heterogeneity, and inefficient communication. To this end, the authors propose FedKDNAS, a novel framework that uniquely integrates client-side distributed neural architecture search with server-guided knowledge distillation. Each client autonomously discovers lightweight models under accuracy and resource constraints, while contributing logits to a global distillation process. Key innovations include a hybrid supervised-distillation objective, a logit-sharing mechanism over a common reference set, and server-side strategies involving prediction smoothing and teacher model fusion. Extensive experiments across six datasets demonstrate that FedKDNAS outperforms six state-of-the-art methods, achieving up to a 15% accuracy gain, 28% reduction in client CPU usage, and a 44-fold decrease in communication overhead.
📝 Abstract
Federated Learning (FL) enables collaborative model training without centralizing data. However, real-world deployments must simultaneously address statistical heterogeneity across client data (non-IID), system heterogeneity in device capabilities, and communication efficiency. Existing FL approaches mitigate these challenges through improved aggregation, personalization, or knowledge distillation, but they almost universally assume a fixed client architecture, limiting adaptability to heterogeneous data complexity and hardware constraints. This architectural constraint often leads to suboptimal trade-offs between accuracy and efficiency in real-world FL systems. This work introduces FedKDNAS, a distillation-driven FL framework that combines client-side neural architecture selection with distillation of server-coordinated knowledge. Each client autonomously selects a lightweight model under accuracy-resource constraints. It then trains it locally using a hybrid objective combining supervised learning and knowledge distillation and shares only predictions on a public reference set. The server then aggregates and smooths these predictions, optionally combining them with a teacher model, to produce stable distillation targets for the next round. Extensive evaluation on six datasets against six representative FL baselines (FedAvg, Ditto, FedMD, FedDF, FedDistill, Local-KD) demonstrates that FedKDNAS consistently achieves superior Pareto efficiency, improving accuracy by up to 15\% under non-IID conditions, reducing client CPU usage by approximately 28\%, and decreasing communication overhead by up to 44 times while maintaining lightweight logit-based communication.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Non-IID
System Heterogeneity
Communication Efficiency
Model Architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Neural Architecture Search
Knowledge Distillation
Non-IID
Communication Efficiency