🤖 AI Summary
Learned optimizers (L2Os) suffer from poor out-of-distribution generalization, limiting their applicability beyond the training data distribution.
Method: This paper proposes a novel paradigm integrating classical optimization priors with data-driven modeling. It systematically incorporates fundamental optimization principles—specifically scale invariance and affine covariance—into the architecture design. We introduce a parameterized quasi-Newton update module explicitly constrained to preserve BFGS structure, and jointly optimize it via end-to-end training that unifies optimization-theoretic modeling, neural network architecture design, and meta-learning.
Contribution/Results: The resulting enhanced BFGS algorithm significantly outperforms both standard L2Os and conventional solvers on unseen problem classes, dimensions, and condition numbers. It achieves over 40% improvement in cross-distribution generalization performance, establishing a new pathway toward more transferable and robust learned optimizers.
📝 Abstract
Towards designing learned optimization algorithms that are usable beyond their training setting, we identify key principles that classical algorithms obey, but have up to now, not been used for Learning to Optimize (L2O). Following these principles, we provide a general design pipeline, taking into account data, architecture and learning strategy, and thereby enabling a synergy between classical optimization and L2O, resulting in a philosophy of Learning Optimization Algorithms. As a consequence our learned algorithms perform well far beyond problems from the training distribution. We demonstrate the success of these novel principles by designing a new learning-enhanced BFGS algorithm and provide numerical experiments evidencing its adaptation to many settings at test time.