🤖 AI Summary
This work addresses the lack of provable joint guarantees on generalization and convergence in learned optimization algorithms. Methodologically: (1) it establishes the first PAC-Bayesian generalization bound for unbounded losses, leveraging exponential-family posterior distributions; (2) it formulates optimizer learning as a tractable one-dimensional global optimization problem—convex or non-convex—whose solution is analytically characterizable; and (3) it integrates stochastic optimization design with rigorous theoretical analysis to explicitly trade off convergence rate against generalization error. Empirically, the learned optimizers achieve order-of-magnitude improvements over state-of-the-art methods across four diverse real-world tasks—including neural architecture search, meta-learning, adversarial training, and federated learning—while all gains are underpinned by formal theoretical guarantees. This constitutes the first learning-to-optimize framework endowed with a provably tight PAC-Bayesian generalization bound and jointly certified convergence–generalization performance.
📝 Abstract
We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit trade-off between convergence guarantees and convergence speed, which contrasts with the typical worst-case analysis. Our learned optimization algorithms provably outperform related ones derived from a (deterministic) worst-case analysis. The results rely on PAC-Bayesian bounds for general, possibly unbounded loss-functions based on exponential families. Then, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum. Furthermore, we provide a concrete algorithmic realization of the framework and new methodologies for learning-to-optimize, and we conduct four practically relevant experiments to support our theory. With this, we showcase that the provided learning framework yields optimization algorithms that provably outperform the state-of-the-art by orders of magnitude.