๐ค AI Summary
Existing continuous black-box optimization benchmarks, such as BBOB, exhibit limited landscape diversity, hindering comprehensive algorithm evaluation. To address this, this work proposes LLaMEA, a framework that integrates large language models into the evolutionary loop to generate interpretable optimization problems with user-specified high-order landscape characteristics based on natural language descriptions. The evolution is guided by an Exploratory Landscape Analysis (ELA) property predictor, and novelty is introduced through a fitness-sharing mechanism in ELA space, which significantly enhances population diversity. The generated functions faithfully reproduce target structural properties, effectively extending the BBOB instance space. This approach yields the first LLM-driven, controllable, reproducible, and broadly representative benchmark suite for continuous black-box optimization.
๐ Abstract
Benchmarking in continuous black-box optimisation is hindered by the limited structural diversity of existing test suites such as BBOB. We explore whether large language models embedded in an evolutionary loop can be used to design optimisation problems with clearly defined high-level landscape characteristics. Using the LLaMEA framework, we guide an LLM to generate problem code from natural-language descriptions of target properties, including multimodality, separability, basin-size homogeneity, search-space homogeneity and globallocal optima contrast. Inside the loop we score candidates through ELA-based property predictors. We introduce an ELA-space fitness-sharing mechanism that increases population diversity and steers the generator away from redundant landscapes. A complementary basin-of-attraction analysis, statistical testing and visual inspection, verifies that many of the generated functions indeed exhibit the intended structural traits. In addition, a t-SNE embedding shows that they expand the BBOB instance space rather than forming an unrelated cluster. The resulting library provides a broad, interpretable, and reproducible set of benchmark problems for landscape analysis and downstream tasks such as automated algorithm selection.