🤖 AI Summary
To address training instability and significant accuracy degradation in federated learning under highly heterogeneous data (e.g., label skew α = 0.1), this paper proposes HeteRo-Select—a client selection framework that jointly optimizes utility, fairness, update velocity, and data diversity. It integrates high-loss-priority sampling with diversity-enhancement strategies to mitigate bias and improve generalization. Under strong regularization assumptions, we establish theoretical convergence guarantees. Experiments on CIFAR-10 demonstrate that HeteRo-Select achieves a peak accuracy of 74.75% and a final accuracy of 72.76%, with only a 1.99% drop in stability—substantially outperforming baselines such as Oort. Moreover, the framework maintains high communication efficiency while ensuring robust long-term performance across heterogeneous settings.
📝 Abstract
Federated Learning (FL) is a machine learning technique that often suffers from training instability due to the diverse nature of client data. Although utility-based client selection methods like Oort are used to converge by prioritizing high-loss clients, they frequently experience significant drops in accuracy during later stages of training. We propose a theoretical HeteRo-Select framework designed to maintain high performance and ensure long-term training stability. We provide a theoretical analysis showing that when client data is very different (high heterogeneity), choosing a smart subset of client participation can reduce communication more effectively compared to full participation. Our HeteRo-Select method uses a clear, step-by-step scoring system that considers client usefulness, fairness, update speed, and data variety. It also shows convergence guarantees under strong regularization. Our experimental results on the CIFAR-10 dataset under significant label skew ($α=0.1$) support the theoretical findings. The HeteRo-Select method performs better than existing approaches in terms of peak accuracy, final accuracy, and training stability. Specifically, HeteRo-Select achieves a peak accuracy of $74.75%$, a final accuracy of $72.76%$, and a minimal stability drop of $1.99%$. In contrast, Oort records a lower peak accuracy of $73.98%$, a final accuracy of $71.25%$, and a larger stability drop of $2.73%$. The theoretical foundations and empirical performance in our study make HeteRo-Select a reliable solution for real-world heterogeneous FL problems.