🤖 AI Summary
This work addresses the limitations of existing large language model (LLM)-driven approaches to automated heuristic design, which are constrained by fixed evolutionary rules and static prompting templates, hindering long-horizon reasoning and efficient evolution. The authors propose modeling heuristic generation as a sequential decision-making process over an entailment graph, introducing the entailment graph as a stateful memory structure that enables cross-generation information reuse and conflict avoidance. A multi-agent collaborative framework is developed, comprising a policy agent that plans evolutionary actions, a world model agent that simulates heuristic performance, and a critic agent that performs routing-based reflection, thereby transforming trial-and-error evolution into state-aware, planning-driven search. The method achieves significantly faster convergence and yields superior heuristics across multiple combinatorial optimization problems, demonstrates compatibility with diverse LLM backbones, and exhibits strong scalability.
📝 Abstract
Large Language Models (LLMs) have enabled automated heuristic design (AHD) for combinatorial optimization problems (COPs), but existing frameworks'reliance on fixed evolutionary rules and static prompt templates often leads to myopic heuristic generation, redundant evaluations, and limited reasoning about how new heuristics should be derived. We propose a novel multi-agent reasoning framework, referred to as Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs (PathWise), which formulates heuristic generation as a sequential decision process over an entailment graph serving as a compact, stateful memory of the search trajectory. This approach allows the system to carry forward past decisions and reuse or avoid derivation information across generations. A policy agent plans evolutionary actions, a world model agent generates heuristic rollouts conditioned on those actions, and critic agents provide routed reflections summarizing lessons from prior steps, shifting LLM-based AHD from trial-and-error evolution toward state-aware planning through reasoning. Experiments across diverse COPs show that PathWise converges faster to better heuristics, generalizes across different LLM backbones, and scales to larger problem sizes.