POP: Prior-fitted Optimizer Policies

📅 2026-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the sensitivity of traditional gradient-based optimizers to hyperparameters and their limited generalization in highly non-convex problems. The authors propose POP, a meta-learned optimizer that achieves strong out-of-the-box generalization without task-specific hyperparameter tuning by leveraging large-scale synthetic priors encompassing both convex and non-convex objectives. POP dynamically predicts per-coordinate step sizes through a context-aware policy network that exploits information from the optimization trajectory. Evaluated across 47 benchmark functions of varying complexity, POP consistently outperforms first-order gradient methods, evolutionary strategies, Bayesian optimization, and existing meta-learning optimizers.

Technology Category

Application Category

📝 Abstract
Optimization refers to the task of finding extrema of an objective function. Classical gradient-based optimizers are highly sensitive to hyperparameter choices. In highly non-convex settings their performance relies on carefully tuned learning rates, momentum, and gradient accumulation. To address these limitations, we introduce POP (Prior-fitted Optimizer Policies), a meta-learned optimizer that predicts coordinate-wise step sizes conditioned on the contextual information provided in the optimization trajectory. Our model is learned on millions of synthetic optimization problems sampled from a novel prior spanning both convex and non-convex objectives. We evaluate POP on an established benchmark including 47 optimization functions of various complexity, where it consistently outperforms first-order gradient-based methods, non-convex optimization approaches (e.g., evolutionary strategies), Bayesian optimization, and a recent meta-learned competitor under matched budget constraints. Our evaluation demonstrates strong generalization capabilities without task-specific tuning.
Problem

Research questions and friction points this paper is trying to address.

optimization
hyperparameter sensitivity
non-convex optimization
generalization
gradient-based optimizers
Innovation

Methods, ideas, or system contributions that make the work stand out.

meta-learning
optimizer policies
coordinate-wise step sizes
optimization trajectory
non-convex optimization
🔎 Similar Papers
No similar papers found.
J
Jan Kobiolka
Department of Computer Science and Artificial Intelligence, University of Technology Nuremberg, Germany
C
Christian Frey
Department of Computer Science and Artificial Intelligence, University of Technology Nuremberg, Germany
Gresa Shala
Gresa Shala
PhD candidate, University of Freiburg
Meta-learningDynamic Algorithm ConfigurationReinforcement Learning
Arlind Kadra
Arlind Kadra
PhD, University of Freiburg
Deep LearningMeta-LearningAutoML
E
Erind Bedalli
Faculty of Natural Sciences, University of Elbasan, Albania
Josif Grabocka
Josif Grabocka
Professor of Machine Learning
Machine Learning