🤖 AI Summary
This work addresses the inefficiency of surrogate models in optimization and simulation tasks that either rely on expensive high-quality labels or encounter complex optimization landscapes. To overcome these challenges, the authors propose a three-stage learning framework: first, supervised pretraining with abundant yet imperfect low-cost labels; second, self-supervised fine-tuning to refine model representations; and third, integration of feasibility constraints to ensure valid solutions. Theoretical analysis demonstrates that only a small number of low-precision labels are sufficient to guide the model into the basin of attraction of the optimal solution, substantially reducing both data and computational costs. Empirical evaluations across nonconvex constrained optimization, power grid scheduling, and rigid dynamical systems show faster convergence, higher accuracy, and improved feasibility, achieving up to a 59-fold reduction in total offline cost.
📝 Abstract
To scale the solution of optimization and simulation problems, prior work has explored machine-learning surrogates that inexpensively map problem parameters to corresponding solutions. Commonly used approaches, including supervised and self-supervised learning with either soft or hard feasibility enforcement, face inherent challenges such as reliance on expensive, high-quality labels or difficult optimization landscapes. To address their trade-offs, we propose a novel framework that first collects"cheap"imperfect labels, then performs supervised pretraining, and finally refines the model through self-supervised learning to improve overall performance. Our theoretical analysis and merit-based criterion show that labeled data need only place the model within a basin of attraction, confirming that only modest numbers of inexact labels and training epochs are required. We empirically validate our simple three-stage strategy across challenging domains, including nonconvex constrained optimization, power-grid operation, and stiff dynamical systems, and show that it yields faster convergence; improved accuracy, feasibility, and optimality; and up to 59x reductions in total offline cost.