🤖 AI Summary
In constrained deep learning, empirically effective dual optimistic ascent (PI control) lacks theoretical convergence guarantees, suffers from oscillations, and fails to converge to all local optima; meanwhile, the augmented Lagrangian method (ALM), though theoretically grounded, is rarely adopted due to implementation complexity. This paper establishes, for the first time, the exact equivalence between PI control and a gradient descent–ascent discretization of ALM. Leveraging this equivalence, we develop a unified theoretical framework that rigorously proves linear convergence of such methods and guarantees convergence to all local optima. This equivalence not only furnishes PI control with a solid theoretical foundation but also informs principled hyperparameter design—significantly enhancing algorithmic robustness, interpretability, and practicality. Our work bridges the longstanding gap between empirical practice and theoretical guarantees in constrained deep learning.
📝 Abstract
Constrained optimization is a powerful framework for enforcing requirements on neural networks. These constrained deep learning problems are typically solved using first-order methods on their min-max Lagrangian formulation, but such approaches often suffer from oscillations and can fail to find all local solutions. While the Augmented Lagrangian method (ALM) addresses these issues, practitioners often favor dual optimistic ascent schemes (PI control) on the standard Lagrangian, which perform well empirically but lack formal guarantees. In this paper, we establish a previously unknown equivalence between these approaches: dual optimistic ascent on the Lagrangian is equivalent to gradient descent-ascent on the Augmented Lagrangian. This finding allows us to transfer the robust theoretical guarantees of the ALM to the dual optimistic setting, proving it converges linearly to all local solutions. Furthermore, the equivalence provides principled guidance for tuning the optimism hyper-parameter. Our work closes a critical gap between the empirical success of dual optimistic methods and their theoretical foundation.