Learning to Assess the Reliability of Number-of-Runs Estimation in Stochastic Optimization

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study addresses the critical challenge of determining, with minimal computational overhead, whether the number of runs in large-scale benchmarking of stochastic optimization algorithms is sufficient to ensure reliable performance evaluation. For the first time, machine learning is introduced to this task, leveraging 132,000 algorithm runs from Nevergrad on the COCO platform. The authors extract a 23-dimensional feature set encompassing zero-cost, statistical, and shape-stability characteristics to construct a reliability prediction framework tailored to fixed optimizer configurations. By employing within-configuration training strategies and specially designed classifiers, the approach significantly improves recall for unreliable estimates—the minority class. Experimental results demonstrate the feasibility of this learning-based paradigm, although performance remains constrained by limited data diversity within individual configurations, indicating room for further improvement.

📝 Abstract

In large-scale benchmarking of stochastic optimization algorithms, the key challenge is no longer whether repeated runs are needed for reliability, but how to determine when sufficient evidence has been collected without incurring unnecessary computational cost. We study a learning-based extension of a recent empirical online heuristic that adaptively estimates the required number of runs using outlier handling and skewness-based symmetry checks. Using annotated outcomes from 132{,}000 Nevergrad runs on COCO (24 problems in 20 dimensions, 10 instances each, 11 optimizers), we train classifiers on 23 statistical, energy-free, and shape and stability features to predict whether a run-number estimate is reliable, prioritizing detection of incorrect estimates via minority-class recall. We evaluate reliability prediction using a within-configuration learning setup, where models are trained and tested on data sharing the same optimizer. The results show that run-number reliability can be learned in a within-configuration scenario, enabling detection of unreliable estimates with high minority-class recall, although performance remains limited by the restricted data diversity within fixed configurations.

Problem

Research questions and friction points this paper is trying to address.

stochastic optimization

reliability assessment

number-of-runs estimation

benchmarking

computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

learning-based reliability assessment

stochastic optimization benchmarking

adaptive run-number estimation