🤖 AI Summary
In multi-agent AI systems, nonlinear policies induce highly variable local curvature in the inner maximization problem, rendering standard robust training methods either unstable or overly conservative and thereby significantly degrading performance. To address this, this work proposes Adversarially Aligned Jacobian Regularization (AAJR), which precisely aligns Jacobian constraints with the adversarial ascent direction, regulating policy sensitivity only along this critical direction. This approach ensures stability of the inner optimization while effectively mitigating the performance cost typically associated with robustness. Theoretical analysis demonstrates that, under mild conditions, AAJR yields smaller approximation errors, reduced nominal performance loss, and an expanded feasible policy space, thereby achieving a favorable trade-off between robustness and expressive capacity.
📝 Abstract
As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.