BLADE: Benchmark suite for LLM-driven Automated Design and Evolution of iterative optimisation heuristics

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current LLM-driven Automatic Algorithm Discovery (AAD) lacks standardized, scalable evaluation frameworks—particularly for continuous black-box optimization. Method: This paper introduces the first modular AAD benchmark framework tailored to continuous black-box optimization. It integrates established benchmark suites (MA-BBOB, SBOX-COST), a parametric instance generator, structured natural-language problem templates, and analysis interfaces (IOHanalyser, IOHexplainer) alongside code-evolution graph tools. Contribution/Results: The framework enables capability-oriented evaluation across generalization, specialization, and information utilization. Validated through two representative use cases—mutation-based prompting and function specialization—it significantly enhances the systematicity, reproducibility, and interpretability of AAD assessment. As an open-source, plug-and-play infrastructure, it establishes a standardized foundation for rigorously characterizing the capabilities and limitations of LLMs in optimization algorithm design.

Technology Category

Application Category

📝 Abstract

The application of Large Language Models (LLMs) for Automated Algorithm Discovery (AAD), particularly for optimisation heuristics, is an emerging field of research. This emergence necessitates robust, standardised benchmarking practices to rigorously evaluate the capabilities and limitations of LLM-driven AAD methods and the resulting generated algorithms, especially given the opacity of their design process and known issues with existing benchmarks. To address this need, we introduce BLADE (Benchmark suite for LLM-driven Automated Design and Evolution), a modular and extensible framework specifically designed for benchmarking LLM-driven AAD methods in a continuous black-box optimisation context. BLADE integrates collections of benchmark problems (including MA-BBOB and SBOX-COST among others) with instance generators and textual descriptions aimed at capability-focused testing, such as generalisation, specialisation and information exploitation. It offers flexible experimental setup options, standardised logging for reproducibility and fair comparison, incorporates methods for analysing the AAD process (e.g., Code Evolution Graphs and various visualisation approaches) and facilitates comparison against human-designed baselines through integration with established tools like IOHanalyser and IOHexplainer. BLADE provides an `out-of-the-box' solution to systematically evaluate LLM-driven AAD approaches. The framework is demonstrated through two distinct use cases exploring mutation prompt strategies and function specialisation.

Problem

Research questions and friction points this paper is trying to address.

Standardized benchmarking for LLM-driven automated algorithm discovery

Evaluating capabilities and limitations of LLM-generated optimization heuristics

Modular framework for continuous black-box optimization benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework for LLM-driven algorithm benchmarking

Integrates benchmark problems with instance generators

Standardized logging for reproducibility and comparison

🔎 Similar Papers

LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics

2024-05-30IEEE Transactions on Evolutionary ComputationCitations: 2

PhaseEvo: Towards Unified In-Context Prompt Optimization for Large Language Models

2024-02-17arXiv.orgCitations: 7

Authors to Follow