🤖 AI Summary
This work addresses the limitation of existing large language model (LLM)-based heuristic design methods, which rely on delayed final-performance feedback and thus struggle to effectively guide evolutionary search. The authors propose a teacher-aware evolutionary framework that, for the first time, employs an independently trained reinforcement learning policy as a behavioral teacher. This teacher provides fine-grained, local feedback by querying action preferences over states visited by candidate heuristics, enabling simultaneous optimization of task performance and behavioral consistency. By integrating LLMs, evolutionary algorithms, and the policy teacher, the method generates high-performing static heuristic rules without requiring neural network inference at deployment. Experiments across scheduling, path planning, and graph optimization benchmarks demonstrate significant improvements over baseline approaches that rely solely on performance-based feedback.
📝 Abstract
LLM-based automatic heuristic design has shown promise for generating executable heuristics for combinatorial optimization, but existing methods mainly rely on delayed endpoint performance. We propose a \emph{teacher-aware evolutionary framework} that uses independently trained learned optimization policies as behavioral teachers. Instead of deploying or imitating the teacher, our method queries it on states visited by candidate heuristic programs and uses its action preferences as local feedback for evolution. The resulting search discovers static executable heuristics guided by both task performance and teacher-derived behavioral signals. Experiments on scheduling, routing, and graph optimization benchmarks show that our method improves over performance-driven LLM heuristic evolution baselines while requiring no neural inference at deployment. These results suggest that learned optimization policies can be repurposed as behavioral feedback sources for automatic heuristic discovery.