🤖 AI Summary
Existing federated learning methods (e.g., LocalNewton, LTDA, FedSophia) suffer from slow convergence under data heterogeneity due to drift of local preconditioners. This work proposes FedPM, the first framework to introduce a preconditioner parameter mixing mechanism at the server side. FedPM decomposes the global second-order update into two components: gradient preconditioning and local update correction—thereby fundamentally mitigating preconditioner drift. We establish theoretical guarantees showing that FedPM achieves superlinear convergence under strong convexity. Extensive experiments on multiple heterogeneous benchmarks demonstrate significant improvements in test accuracy, validating both the efficacy and stability of second-order optimization in federated learning. The core contributions are (i) a novel server-side preconditioner mixing paradigm and (ii) rigorous convergence analysis ensuring superlinear rates under standard assumptions.
📝 Abstract
We propose Federated Preconditioned Mixing (FedPM), a novel Federated Learning (FL) method that leverages second-order optimization. Prior methods--such as LocalNewton, LTDA, and FedSophia--have incorporated second-order optimization in FL by performing iterative local updates on clients and applying simple mixing of local parameters on the server. However, these methods often suffer from drift in local preconditioners, which significantly disrupts the convergence of parameter training, particularly in heterogeneous data settings. To overcome this issue, we refine the update rules by decomposing the ideal second-order update--computed using globally preconditioned global gradients--into parameter mixing on the server and local parameter updates on clients. As a result, our FedPM introduces preconditioned mixing of local parameters on the server, effectively mitigating drift in local preconditioners. We provide a theoretical convergence analysis demonstrating a superlinear rate for strongly convex objectives in scenarios involving a single local update. To demonstrate the practical benefits of FedPM, we conducted extensive experiments. The results showed significant improvements with FedPM in the test accuracy compared to conventional methods incorporating simple mixing, fully leveraging the potential of second-order optimization.