🤖 AI Summary
This work addresses the sensitivity of hyperparameters in Learning-to-Optimize (L2O) to perturbations in data distribution by proposing the first hyperparameter learning framework based on Wasserstein distributionally robust optimization (DRO). The approach unifies empirical-performance-driven L2O with worst-case performance estimation (PEP) paradigms. It employs a stochastic gradient algorithm capable of differentiably solving an inner semidefinite program to learn hyperparameters of first-order optimizers over a given set of problem instances. The method provides provable generalization bounds, smoothly interpolating between empirical and worst-case optimality as the sample size grows. Experiments on unconstrained quadratic optimization, LASSO, and linear programming tasks demonstrate that the learned algorithms significantly outperform conventional L2O and worst-case optimal baselines while maintaining certifiable robustness.
📝 Abstract
We propose a distributionally robust approach to learning hyperparameters for first-order methods in convex optimization. Given a dataset of problem instances, we minimize a Wasserstein distributionally robust version of the performance estimation problem (PEP) over algorithm parameters such as step sizes. Our framework unifies two extremes: as the robustness radius vanishes, we recover classical learning to optimize (L2O); as it grows, we recover worst-case optimal algorithm design via PEP. We solve the resulting problem with stochastic gradient descent, differentiating through the solution of an inner semidefinite program at each step. We prove high-probability bounds showing that the true risk of the learned algorithm is at most the in-sample L2O optimum plus a slack that shrinks with the sample size, and is no worse than the worst-case PEP bound. On unconstrained quadratic minimization, LASSO, and linear programming benchmarks, our learned algorithms achieve strong out-of-sample performance with certifiable robustness, outperforming both worst-case optimal and vanilla L2O baselines.