Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

📅 2024-09-26

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address statistical heterogeneity and high communication overhead in federated learning (FL) under severe label imbalance in mobile edge–cloud networks, this paper proposes the Hybrid Federated Learning framework with Distillation and Clustering (HFLDD). The method first performs heterogeneous clustering of clients based on data distribution to achieve inter-cluster label balance, then introduces dataset distillation within each cluster to compress and transfer representative samples to cluster heads—thereby mitigating Non-IID bias and reducing communication load. HFLDD integrates hierarchical aggregation (cluster head → server) and Non-IID-aware model evaluation. Experiments under strong label skew demonstrate that HFLDD achieves up to a 12.3% improvement in test accuracy over baselines including FedAvg and FedProx, while reducing total communication rounds by 37%. The key contributions are: (i) the novel co-design of heterogeneous clustering and cluster-head distillation; (ii) effective alleviation of both statistical heterogeneity and communication bottlenecks in resource-constrained edge–cloud FL settings.

Technology Category

Application Category

📝 Abstract

In federated learning, the heterogeneity of client data has a great impact on the performance of model training. Many heterogeneity issues in this process are raised by non-independently and identically distributed (Non-IID) data. This study focuses on the issue of label distribution skew. To address it, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training. Particularly, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced. The cluster headers collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server. This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of Non-IID data on model training. Furthermore, we compare our proposed method with typical baseline methods on public datasets. Experimental results demonstrate that when the data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost.

Problem

Research questions and friction points this paper is trying to address.

Addressing non-IID data challenges in federated learning

Reducing communication overhead in edge-cloud FL systems

Improving model accuracy with dataset distillation techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid FL with dataset distillation

Cluster-based IID data generation

Reduced communication and computational overhead

🔎 Similar Papers

No similar papers found.