🤖 AI Summary
This work addresses online personalized decentralized learning under statistical heterogeneity across clients. We propose a dynamic collaboration optimization framework grounded in gradient similarity. To mitigate bias introduced by collaboration and reduce gradient variance, we design an adaptive collaboration selection criterion, theoretically establishing its variance-reduction property under smooth convex, non-convex, and Polyak–Łojasiewicz conditions—while preserving the optimal convergence rate of the All-for-One algorithm. The method integrates distributed optimization, gradient similarity measurement, and a dynamic neighbor selection mechanism. Extensive experiments on synthetic and real-world datasets demonstrate that our approach significantly reduces excess risk, improves training efficiency for personalized models, and enhances generalization performance.
📝 Abstract
We study the problem of online personalized decentralized learning with $N$ statistically heterogeneous clients collaborating to accelerate local training. An important challenge in this setting is to select relevant collaborators to reduce gradient variance while mitigating the introduced bias. To tackle this, we introduce a gradient-based collaboration criterion, allowing each client to dynamically select peers with similar gradients during the optimization process. Our criterion is motivated by a refined and more general theoretical analysis of the All-for-one algorithm, proved to be optimal in Even et al. (2022) for an oracle collaboration scheme. We derive excess loss upper-bounds for smooth objective functions, being either strongly convex, non-convex, or satisfying the Polyak-Lojasiewicz condition; our analysis reveals that the algorithm acts as a variance reduction method where the speed-up depends on a sufficient variance. We put forward two collaboration methods instantiating the proposed general schema; and we show that one variant preserves the optimality of All-for-one. We validate our results with experiments on synthetic and real datasets.