🤖 AI Summary
In dynamic federated learning (FL), client churn induces objective drift and renders standard global model initialization ineffective. This work establishes, for the first time, a non-convex FL convergence theory under dynamic client participation. We propose a gradient-similarity-based weighted historical model initialization mechanism: it adaptively aggregates historical local models by weighting each according to the cosine similarity between its stored gradient and the average gradient of the currently active client set, thereby significantly enhancing rapid model recovery under data distribution shifts. The method is compatible with local SGD and naturally accommodates statistical heterogeneity. Extensive experiments across image and text benchmarks—and integrated with mainstream algorithms including FedAvg and FedProx—demonstrate consistent improvements: up to 3.2× faster convergence and a 40% reduction in communication rounds.
📝 Abstract
Most federated learning (FL) approaches assume a fixed client set. However, real-world scenarios often involve clients dynamically joining or leaving the system based on their needs or interest in specific tasks. This dynamic setting introduces unique challenges: (1) the optimization objective evolves with the active client set, unlike traditional FL with a static objective; and (2) the current global model may no longer serve as an effective initialization for subsequent rounds, potentially hindering adaptation. To address these challenges, we first provide a convergence analysis under a non-convex loss with a dynamic client set, accounting for factors such as gradient noise, local training iterations, and data heterogeneity. Building on this analysis, we propose a model initialization algorithm that enables rapid adaptation to new client sets whenever clients join or leave the system. Our key idea is to compute a weighted average of previous global models, guided by gradient similarity, to prioritize models trained on data distributions that closely align with the current client set, thereby accelerating recovery from distribution shifts. This plug-and-play algorithm is designed to integrate seamlessly with existing FL methods, offering broad applicability in practice. Experimental results on diverse datasets including both image and text domains, varied label distributions, and multiple FL algorithms demonstrate the effectiveness of the proposed approach across a range of scenarios.