π€ AI Summary
In federated learning, the deep coupling between statistical heterogeneity (non-IID data distributions) and system heterogeneity (device stragglers) causes severe update staleness, degrading model accuracy and convergence efficiency. This paper is the first to jointly model and address their intrinsic interdependence via a zero-overhead transformation framework: it requires no auxiliary datasets nor full local training; instead, it inversely estimates local data distributions from gradient or model-update statistics, enabling dynamic βstaleβfreshβ update calibration. Furthermore, it introduces unsupervised parameter reweighting and an asynchronous optimization mechanism. Evaluated on standard benchmarks (e.g., CIFAR-10/100, FEMNIST) with diverse models (e.g., ResNet, CNN), the method achieves up to 25% improvement in final test accuracy and reduces required communication rounds by up to 35%, while imposing zero additional computation or communication overhead on clients.
π Abstract
Federated Learning (FL) can be affected by data and device heterogeneities, caused by clients' different local data distributions and latencies in uploading model updates (i.e., staleness). Traditional schemes consider these heterogeneities as two separate and independent aspects, but this assumption is unrealistic in practical FL scenarios where these heterogeneities are intertwined. In these cases, traditional FL schemes are ineffective, and a better approach is to convert a stale model update into a unstale one. In this paper, we present a new FL framework that ensures the accuracy and computational efficiency of this conversion, hence effectively tackling the intertwined heterogeneities that may cause unlimited staleness in model updates. Our basic idea is to estimate the distributions of clients' local training data from their uploaded stale model updates, and use these estimations to compute unstale client model updates. In this way, our approach does not require any auxiliary dataset nor the clients' local models to be fully trained, and does not incur any additional computation or communication overhead at client devices. We compared our approach with the existing FL strategies on mainstream datasets and models, and showed that our approach can improve the trained model accuracy by up to 25% and reduce the number of required training epochs by up to 35%. Source codes can be found at: https://github.com/pittisl/FL-with-intertwined-heterogeneity.