🤖 AI Summary
This work establishes the theoretical foundation for the global convergence of gradient-based methods to von Neumann–Nash equilibria in nonconvex–nonconcave neural network minimax games. Motivated by the empirical success—but lack of theoretical justification—of gradient methods in adversarial training and AI alignment, particularly for wide networks, the authors first prove that, under overparameterization, two-layer neural networks satisfy the bilateral Polyak–Łojasiewicz (PŁ) condition with high probability upon random initialization. Building on this, they derive a path-length bound for alternating gradient descent–ascent, revealing an implicit convex–concave structure induced by overparameterization. The analysis rigorously establishes global convergence of simple gradient dynamics to equilibria in wide networks, providing the first provable convergence guarantee for robust optimization in such settings, along with verifiable sufficient conditions.
📝 Abstract
Many emerging applications - such as adversarial training, AI alignment, and robust optimization - can be framed as zero-sum games between neural nets, with von Neumann-Nash equilibria (NE) capturing the desirable system behavior. While such games often involve non-convex non-concave objectives, empirical evidence shows that simple gradient methods frequently converge, suggesting a hidden geometric structure. In this paper, we provide a theoretical framework that explains this phenomenon through the lens of hidden convexity and overparameterization. We identify sufficient conditions - spanning initialization, training dynamics, and network width - that guarantee global convergence to a NE in a broad class of non-convex min-max games. To our knowledge, this is the first such result for games that involve two-layer neural networks. Technically, our approach is twofold: (a) we derive a novel path-length bound for the alternating gradient descent-ascent scheme in min-max games; and (b) we show that the reduction from a hidden convex-concave geometry to two-sided Polyak-Łojasiewicz (PŁ) min-max condition hold with high probability under overparameterization, using tools from random matrix theory.