Distributionally-Robust Learning to Optimize

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the sensitivity of hyperparameters in Learning-to-Optimize (L2O) to perturbations in data distribution by proposing the first hyperparameter learning framework based on Wasserstein distributionally robust optimization (DRO). The approach unifies empirical-performance-driven L2O with worst-case performance estimation (PEP) paradigms. It employs a stochastic gradient algorithm capable of differentiably solving an inner semidefinite program to learn hyperparameters of first-order optimizers over a given set of problem instances. The method provides provable generalization bounds, smoothly interpolating between empirical and worst-case optimality as the sample size grows. Experiments on unconstrained quadratic optimization, LASSO, and linear programming tasks demonstrate that the learned algorithms significantly outperform conventional L2O and worst-case optimal baselines while maintaining certifiable robustness.

📝 Abstract

We propose a distributionally robust approach to learning hyperparameters for first-order methods in convex optimization. Given a dataset of problem instances, we minimize a Wasserstein distributionally robust version of the performance estimation problem (PEP) over algorithm parameters such as step sizes. Our framework unifies two extremes: as the robustness radius vanishes, we recover classical learning to optimize (L2O); as it grows, we recover worst-case optimal algorithm design via PEP. We solve the resulting problem with stochastic gradient descent, differentiating through the solution of an inner semidefinite program at each step. We prove high-probability bounds showing that the true risk of the learned algorithm is at most the in-sample L2O optimum plus a slack that shrinks with the sample size, and is no worse than the worst-case PEP bound. On unconstrained quadratic minimization, LASSO, and linear programming benchmarks, our learned algorithms achieve strong out-of-sample performance with certifiable robustness, outperforming both worst-case optimal and vanilla L2O baselines.

Problem

Research questions and friction points this paper is trying to address.

distributionally robust optimization

learning to optimize

hyperparameter learning

performance estimation problem

Wasserstein ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

distributionally robust optimization

learning to optimize

performance estimation problem