Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work establishes the theoretical foundation for the global convergence of gradient-based methods to von Neumann–Nash equilibria in nonconvex–nonconcave neural network minimax games. Motivated by the empirical success—but lack of theoretical justification—of gradient methods in adversarial training and AI alignment, particularly for wide networks, the authors first prove that, under overparameterization, two-layer neural networks satisfy the bilateral Polyak–Łojasiewicz (PŁ) condition with high probability upon random initialization. Building on this, they derive a path-length bound for alternating gradient descent–ascent, revealing an implicit convex–concave structure induced by overparameterization. The analysis rigorously establishes global convergence of simple gradient dynamics to equilibria in wide networks, providing the first provable convergence guarantee for robust optimization in such settings, along with verifiable sufficient conditions.

Technology Category

Application Category

📝 Abstract

Many emerging applications - such as adversarial training, AI alignment, and robust optimization - can be framed as zero-sum games between neural nets, with von Neumann-Nash equilibria (NE) capturing the desirable system behavior. While such games often involve non-convex non-concave objectives, empirical evidence shows that simple gradient methods frequently converge, suggesting a hidden geometric structure. In this paper, we provide a theoretical framework that explains this phenomenon through the lens of hidden convexity and overparameterization. We identify sufficient conditions - spanning initialization, training dynamics, and network width - that guarantee global convergence to a NE in a broad class of non-convex min-max games. To our knowledge, this is the first such result for games that involve two-layer neural networks. Technically, our approach is twofold: (a) we derive a novel path-length bound for the alternating gradient descent-ascent scheme in min-max games; and (b) we show that the reduction from a hidden convex-concave geometry to two-sided Polyak-Łojasiewicz (PŁ) min-max condition hold with high probability under overparameterization, using tools from random matrix theory.

Problem

Research questions and friction points this paper is trying to address.

Explaining convergence in non-convex neural min-max games via hidden convexity

Identifying conditions for global convergence to Nash equilibria

Providing first theoretical guarantees for two-layer neural network games

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hidden convexity and overparameterization ensure global convergence

Path-length bound for alternating gradient descent-ascent scheme

Two-sided Polyak-Łojasiewicz condition via random matrix theory

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Monetization AI