π€ AI Summary
This work addresses the limitations of existing distributionally robust optimization methods, whose risk certification typically relies on global Lipschitz constants or local gradients, thereby struggling with non-Lipschitz or non-differentiable losses and leading to overly conservative bounds or approximation errors. To overcome these issues, the paper introduces a novel geometric framework based on growth-rate functions and proposes, for the first time, the concept of a βconcave certificate,β which dispenses with conventional Lipschitz and differentiability assumptions. This approach yields tight distributionally robust risk upper bounds by integrating Wasserstein ambiguity sets with adversarial scores, enabling efficient layer-wise analysis of neural networks. Notably, it derives deterministic generalization bounds whose complexity does not depend on input diameter, network width, or depth. Empirical evaluations on real-world classification and regression tasks demonstrate the superiority of the proposed method in producing tighter risk bounds and more accurate generalization estimates.
π Abstract
Distributionally Robust (DR) optimization aims to certify worst-case risk within a Wasserstein uncertainty set. Current certifications typically rely either on global Lipschitz bounds, which are often conservative, or on local gradient information, which provides only a first-order approximation. This paper introduces a novel geometric framework based on the least concave majorants of the growth rate function. Our proposed concave certificate establishes a tight bound of DR risk that remains applicable to non-Lipschitz and non-differentiable losses. We extend this framework to complexity analysis, introducing a deterministic bound that complements standard statistical generalization bound. Furthermore, we utilize this certificate to bound the gap between adversarial and empirical Rademacher complexity, demonstrating that dependencies on input diameter, network width, and depth can be eliminated. For practical application in deep learning, we introduce the adversarial score as a tractable relaxation of the concave certificate that enables efficient and layer-wise analysis of neural networks. We validate our theoretical results in various numerical experiments on classification and regression tasks on real-world data.