🤖 AI Summary
This work addresses primal-dual optimization in distributed empirical risk minimization, aiming to unify the theoretical understanding of CoCoA and ADMM-type algorithms. Methodologically, we reformulate the dual problem and establish a unified primal-dual update framework. Our key contribution is the first rigorous proof that CoCoA is equivalent to proximal ADMM under a specific choice of the augmented Lagrangian penalty parameter. Furthermore, we demonstrate that judicious tuning of this parameter substantially improves both convergence rate and communication efficiency for various ADMM variants—including consensus, linearized, and proximal ADMM—rendering them uniformly superior to standard CoCoA. We provide a unified convergence analysis with non-asymptotic guarantees. Extensive experiments on synthetic and real-world datasets empirically validate the superiority of parameter-tuned ADMM variants. This work offers new theoretical insights and practical guidance for algorithm selection and design in distributed learning.
📝 Abstract
We study primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), including consensus ADMM, linearized ADMM, and proximal ADMM. We demonstrate that both classes of algorithms can be transformed into a unified update form that involves only primal and dual variables. This discovery reveals key connections between the two classes of algorithms: CoCoA can be interpreted as a special case of proximal ADMM for solving the dual problem, while consensus ADMM is closely related to a proximal ADMM algorithm. This discovery provides the insight that by adjusting the augmented Lagrangian parameter, we can easily enable the ADMM variants to outperform the CoCoA variants. We further explore linearized versions of ADMM and analyze the effects of tuning parameters on these ADMM variants in the distributed setting. Our theoretical findings are supported by extensive simulation studies and real-world data analysis.