Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

📅 2024-09-26
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address statistical heterogeneity and high communication overhead in federated learning (FL) under severe label imbalance in mobile edge–cloud networks, this paper proposes the Hybrid Federated Learning framework with Distillation and Clustering (HFLDD). The method first performs heterogeneous clustering of clients based on data distribution to achieve inter-cluster label balance, then introduces dataset distillation within each cluster to compress and transfer representative samples to cluster heads—thereby mitigating Non-IID bias and reducing communication load. HFLDD integrates hierarchical aggregation (cluster head → server) and Non-IID-aware model evaluation. Experiments under strong label skew demonstrate that HFLDD achieves up to a 12.3% improvement in test accuracy over baselines including FedAvg and FedProx, while reducing total communication rounds by 37%. The key contributions are: (i) the novel co-design of heterogeneous clustering and cluster-head distillation; (ii) effective alleviation of both statistical heterogeneity and communication bottlenecks in resource-constrained edge–cloud FL settings.

Technology Category

Application Category

📝 Abstract
In federated learning, the heterogeneity of client data has a great impact on the performance of model training. Many heterogeneity issues in this process are raised by non-independently and identically distributed (Non-IID) data. This study focuses on the issue of label distribution skew. To address it, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training. Particularly, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced. The cluster headers collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server. This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of Non-IID data on model training. Furthermore, we compare our proposed method with typical baseline methods on public datasets. Experimental results demonstrate that when the data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost.
Problem

Research questions and friction points this paper is trying to address.

Addressing non-IID data challenges in federated learning
Reducing communication overhead in edge-cloud FL systems
Improving model accuracy with dataset distillation techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid FL with dataset distillation
Cluster-based IID data generation
Reduced communication and computational overhead
🔎 Similar Papers
No similar papers found.
X
Xiufang Shi
College of Information Science, Zhejiang University of Technology, Hangzhou 310023, China
W
Wei Zhang
College of Information Science, Zhejiang University of Technology, Hangzhou 310023, China
M
Mincheng Wu
College of Information Science, Zhejiang University of Technology, Hangzhou 310023, China
G
Guangyi Liu
College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
Zhenyu Wen
Zhenyu Wen
Zhejiang University of Technology
AI SystemCloud computingSocial computingDistributed computing
Shibo He
Shibo He
Professor, College of Control Science and Engineering, Zhejiang university
Internet of ThingsBig DataNetwork Science
Tejal Shah
Tejal Shah
Newcastle University
ontologyOWLinformaticsdata integrationknowledge representation
R
R. Ranjan
Computing Science and Internet of Things, Newcastle University, NE1 7RU Newcastle, U.K.