🤖 AI Summary
Existing LLM-driven heuristic design relies on predefined evolutionary operators and single-task training, limiting algorithmic diversity and generalization across problem scales.
Method: We propose Meta-Optimization of Heuristics (MoH), the first framework to integrate meta-learning into the optimizer-design layer for heuristic generation—enabling LLMs to autonomously discover, construct, and iteratively refine diverse optimizers at the meta-optimization level. MoH introduces a self-referential mechanism for interpretable, self-generated optimizers and employs multi-task reinforcement learning to enhance cross-scale generalization. The method synergistically combines LLMs, meta-learning, self-reference, multi-task RL, and evolutionary search.
Contribution/Results: MoH achieves state-of-the-art performance on classical combinatorial optimization benchmarks, significantly outperforming baselines in zero-shot transfer across problem scales. Generated optimizers exhibit both high interpretability and strong cross-scale transferability.
📝 Abstract
Heuristic design with large language models (LLMs) has emerged as a promising approach for tackling combinatorial optimization problems (COPs). However, existing approaches often rely on manually predefined evolutionary computation (EC) optimizers and single-task training schemes, which may constrain the exploration of diverse heuristic algorithms and hinder the generalization of the resulting heuristics. To address these issues, we propose Meta-Optimization of Heuristics (MoH), a novel framework that operates at the optimizer level, discovering effective optimizers through the principle of meta-learning. Specifically, MoH leverages LLMs to iteratively refine a meta-optimizer that autonomously constructs diverse optimizers through (self-)invocation, thereby eliminating the reliance on a predefined EC optimizer. These constructed optimizers subsequently evolve heuristics for downstream tasks, enabling broader heuristic exploration. Moreover, MoH employs a multi-task training scheme to promote its generalization capability. Experiments on classic COPs demonstrate that MoH constructs an effective and interpretable meta-optimizer, achieving state-of-the-art performance across various downstream tasks, particularly in cross-size settings.