Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the performance and resource imbalances in federated learning caused by non-IID data distributions, device heterogeneity, and inefficient communication. To this end, the authors propose FedKDNAS, a novel framework that uniquely integrates client-side distributed neural architecture search with server-guided knowledge distillation. Each client autonomously discovers lightweight models under accuracy and resource constraints, while contributing logits to a global distillation process. Key innovations include a hybrid supervised-distillation objective, a logit-sharing mechanism over a common reference set, and server-side strategies involving prediction smoothing and teacher model fusion. Extensive experiments across six datasets demonstrate that FedKDNAS outperforms six state-of-the-art methods, achieving up to a 15% accuracy gain, 28% reduction in client CPU usage, and a 44-fold decrease in communication overhead.

📝 Abstract

Federated Learning (FL) enables collaborative model training without centralizing data. However, real-world deployments must simultaneously address statistical heterogeneity across client data (non-IID), system heterogeneity in device capabilities, and communication efficiency. Existing FL approaches mitigate these challenges through improved aggregation, personalization, or knowledge distillation, but they almost universally assume a fixed client architecture, limiting adaptability to heterogeneous data complexity and hardware constraints. This architectural constraint often leads to suboptimal trade-offs between accuracy and efficiency in real-world FL systems. This work introduces FedKDNAS, a distillation-driven FL framework that combines client-side neural architecture selection with distillation of server-coordinated knowledge. Each client autonomously selects a lightweight model under accuracy-resource constraints. It then trains it locally using a hybrid objective combining supervised learning and knowledge distillation and shares only predictions on a public reference set. The server then aggregates and smooths these predictions, optionally combining them with a teacher model, to produce stable distillation targets for the next round. Extensive evaluation on six datasets against six representative FL baselines (FedAvg, Ditto, FedMD, FedDF, FedDistill, Local-KD) demonstrates that FedKDNAS consistently achieves superior Pareto efficiency, improving accuracy by up to 15\% under non-IID conditions, reducing client CPU usage by approximately 28\%, and decreasing communication overhead by up to 44 times while maintaining lightweight logit-based communication.

Problem

Research questions and friction points this paper is trying to address.

Federated Learning

Non-IID

System Heterogeneity

Communication Efficiency

Model Architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Neural Architecture Search

Knowledge Distillation