BLADE: Benchmark suite for LLM-driven Automated Design and Evolution of iterative optimisation heuristics

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-driven Automatic Algorithm Discovery (AAD) lacks standardized, scalable evaluation frameworks—particularly for continuous black-box optimization. Method: This paper introduces the first modular AAD benchmark framework tailored to continuous black-box optimization. It integrates established benchmark suites (MA-BBOB, SBOX-COST), a parametric instance generator, structured natural-language problem templates, and analysis interfaces (IOHanalyser, IOHexplainer) alongside code-evolution graph tools. Contribution/Results: The framework enables capability-oriented evaluation across generalization, specialization, and information utilization. Validated through two representative use cases—mutation-based prompting and function specialization—it significantly enhances the systematicity, reproducibility, and interpretability of AAD assessment. As an open-source, plug-and-play infrastructure, it establishes a standardized foundation for rigorously characterizing the capabilities and limitations of LLMs in optimization algorithm design.

Technology Category

Application Category

📝 Abstract
The application of Large Language Models (LLMs) for Automated Algorithm Discovery (AAD), particularly for optimisation heuristics, is an emerging field of research. This emergence necessitates robust, standardised benchmarking practices to rigorously evaluate the capabilities and limitations of LLM-driven AAD methods and the resulting generated algorithms, especially given the opacity of their design process and known issues with existing benchmarks. To address this need, we introduce BLADE (Benchmark suite for LLM-driven Automated Design and Evolution), a modular and extensible framework specifically designed for benchmarking LLM-driven AAD methods in a continuous black-box optimisation context. BLADE integrates collections of benchmark problems (including MA-BBOB and SBOX-COST among others) with instance generators and textual descriptions aimed at capability-focused testing, such as generalisation, specialisation and information exploitation. It offers flexible experimental setup options, standardised logging for reproducibility and fair comparison, incorporates methods for analysing the AAD process (e.g., Code Evolution Graphs and various visualisation approaches) and facilitates comparison against human-designed baselines through integration with established tools like IOHanalyser and IOHexplainer. BLADE provides an `out-of-the-box' solution to systematically evaluate LLM-driven AAD approaches. The framework is demonstrated through two distinct use cases exploring mutation prompt strategies and function specialisation.
Problem

Research questions and friction points this paper is trying to address.

Standardized benchmarking for LLM-driven automated algorithm discovery
Evaluating capabilities and limitations of LLM-generated optimization heuristics
Modular framework for continuous black-box optimization benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework for LLM-driven algorithm benchmarking
Integrates benchmark problems with instance generators
Standardized logging for reproducibility and comparison
N
N. V. Stein
LIACS, Leiden University, Leiden, Netherlands
A
Anna V. Kononova
LIACS, Leiden University, Leiden, Netherlands
Haoran Yin
Haoran Yin
Leiden University
T
T. Back
LIACS, Leiden University, Leiden, Netherlands