π€ AI Summary
Federated learning (FL) suffers from weak and only expectation-based convergence under non-independent and identically distributed (Non-IID) client data.
Method: This paper introduces stochastic approximation (SA) theory into FL and proposes a client-adaptive decaying step-size mechanism, where local update step sizes decrease heterogeneously across clients and iterations, ensuring almost-sure convergence of the global model.
Contribution/Results: It is the first work to systematically embed the SA framework into FL, rigorously proving that the aggregation weights asymptotically track an autonomous ordinary differential equation (ODE), thereby establishing a solid theoretical foundation for convergence. The method dynamically adjusts each clientβs contribution to the global model, enhancing robustness to data rarity. Experiments under Non-IID settings demonstrate superior convergence stability and higher final accuracy compared to FedAvg and FedProx, validating both the theoretical tracking property and strong generalization capability.
π Abstract
This paper examines Federated learning (FL) in a Stochastic Approximation (SA) framework. FL is a collaborative way to train neural network models across various participants or clients without centralizing their data. Each client will train a model on their respective data and send the weights across to a the server periodically for aggregation. The server aggregates these weights which are then used by the clients to re-initialize their neural network and continue the training. SA is an iterative algorithm that uses approximate sample gradients and tapering step size to locate a minimizer of a cost function. In this paper the clients use a stochastic approximation iterate to update the weights of its neural network. It is shown that the aggregated weights track an autonomous ODE. Numerical simulations are performed and the results are compared with standard algorithms like FedAvg and FedProx. It is observed that the proposed algorithm is robust and gives more reliable estimates of the weights, in particular when the clients data are not identically distributed.