🤖 AI Summary
This work addresses the challenge of efficiently optimizing complex functions generated by the Generalized Numerical Benchmark Generator (GNBG) under strict black-box conditions. The authors propose a novel optimization framework that integrates an LLM-driven autonomous research loop with a multi-strategy hybrid approach. Building upon LSHADE, the method incorporates LLM-designed enhanced mutation operators and combines adaptive CMA-ES with multi-start L-BFGS-B for local refinement, while carefully avoiding any misuse of benchmark meta-information that would violate the black-box assumption. Evaluated over 31 official runs, the proposed algorithm achieves machine precision on 18 out of 24 functions, wins 510 out of 744 pairwise comparisons, and correctly identifies the plateau characteristics of the six most difficult functions, significantly outperforming existing black-box optimizers.
📝 Abstract
We present ARES-LSHADE, a memetic differential-evolution variant submitted to the GECCO 2026 competition on LLM-designed evolutionary algorithms for the Generalized Numerical Benchmark Generator (GNBG). The algorithm builds on the LLM-LSHADE 2025 winner, contributing two new components: (a) a scout-augmented mutation operator with adaptive CMA-ES integration, produced by an autonomous research loop across approximately thirty LLM-driven design experiments, and (b) a multi-start L-BFGS-B polish phase that respects strict blackbox treatment of the benchmark. On the official 31-run-per-function evaluation with the competition-specified function-evaluation budgets, ARES-LSHADE obtains 510 of 744 wins (per-function gap below 1e-8), reaching machine precision on 18 of 24 functions. The remaining six functions exhibit characteristic plateau signatures consistent with GNBG's compositional structure, and were independently identified by the autoresearch loop as the hardest of the suite. Beyond the result itself, this report documents two methodological observations: (i) an LLM-driven research loop with operator-only edit surface and fitness-only observation space converges to a characteristic plateau on this benchmark; (ii) when we initially widened the observation space to include the benchmark's compositional metadata, the resulting algorithm trivially solved all 24 functions but violated the competition's blackbox rule, which we identified before submission. We discuss this tension between LLM capability and benchmark integrity as a design consideration for future LLM-driven optimization-algorithm research. Code and reproducibility artifacts are available at https://github.com/anaeem1/ARES-LSHADE.