🤖 AI Summary
This work addresses the limited domain generality of heuristics generated by current large language models (LLMs), which often fail to surpass handcrafted methods. The authors propose a novel framework that integrates LLMs with evolutionary search, leveraging MAP-Elites to maintain behavioral diversity and automatically evolve domain-independent heuristics for symbolic AI planning tasks. Fitness is evaluated through a composite metric balancing coverage and solving time. For the first time, LLM-generated heuristics outperform the strongest human-designed baselines on previously unseen domains. The study systematically characterizes the Pareto frontier between heuristic informativeness and computational speed, and reveals that evolution starting from simple blind heuristics yields better results than seeding with strong handcrafted heuristics. The resulting heuristics are plug-and-play, seamlessly integrating into existing planning systems.
📝 Abstract
Heuristic search is the dominant paradigm in symbolic AI planning, and the strongest heuristics are the result of decades of work by planning researchers. Recent work has shown that large language models (LLMs) can design heuristics for individual planning domains, but no LLM-generated heuristic has so far worked on arbitrary planning tasks. In this paper, we use evolutionary search to produce the first LLM-generated domain-independent heuristics that exceed the hand-engineered state of the art. We let an LLM mutate parent heuristics written in C++, store candidates in a MAP-Elites archive keyed on informedness and speed and calculate fitness scores by blending coverage with solving time. To place the evolved programs in context, we additionally benchmark a broad set of hand-engineered heuristics on their informedness-speed tradeoff, which to our knowledge has not been done before. On unseen testing domains, our best evolved heuristic solves more tasks than even the strongest baseline, with our full heuristic suite spanning the Pareto frontier of said tradeoff. We also find that seeding evolution from the trivial blind heuristic outperforms seeding from the strong FF heuristic, even when the resulting program is itself an FF variant, and that LLM reasoning effort affects how often candidates compile much more than the quality of those that do. Because the evolved programs are plain C++, they slot into existing planners as drop-in replacements and inherit the soundness and completeness guarantees of the underlying search.