🤖 AI Summary
Manual design of heuristics for combinatorial optimization problems (COPs) heavily relies on domain expertise and is time-consuming. Method: This paper proposes a novel framework that synergistically integrates large language models (LLMs) with Monte Carlo tree search (MCTS) for automated heuristic optimization. LLMs leverage semantic generation and self-reflection to produce candidate heuristics, while MCTS performs reward-guided, sequential evaluation and iterative refinement over structured state spaces—establishing a closed-loop “generate–evaluate–revise” pipeline. Contribution/Results: To our knowledge, this is the first work to deeply fuse LLMs’ symbolic reasoning capabilities with MCTS’s interpretable decision-making mechanism, overcoming limitations of both manual design and black-box LLM-based approaches. On benchmark tasks—including the Traveling Salesman Problem (TSP) and Flow Shop Scheduling Problem (FSSP)—our method significantly outperforms handcrafted heuristics and existing LLM-driven automatic design methods, achieving state-of-the-art performance in heuristic automation for COPs.
📝 Abstract
Heuristics have achieved great success in solv- ing combinatorial optimization problems (COPs). However, heuristics designed by humans re- quire too much domain knowledge and testing time. Given the fact that Large Language Mod- els (LLMs) possess strong capabilities to under- stand and generate content, and a knowledge base that covers various domains, which offer a novel way to automatically optimize heuristics. There- fore, we propose Planning of Heuristics (PoH), an optimization method that integrates the self- reflection of LLMs with the Monte Carlo Tree Search (MCTS), a well-known planning algo- rithm. PoH iteratively refines generated heuristics by evaluating their performance and providing im- provement suggestions. Our method enables to it- eratively evaluate the generated heuristics (states) and improve them based on the improvement sug- gestions (actions) and evaluation results (rewards), by effectively simulating future states to search for paths with higher rewards. In this paper, we apply PoH to solve the Traveling Salesman Prob- lem (TSP) and the Flow Shop Scheduling Prob- lem (FSSP). The experimental results show that PoH outperforms other hand-crafted heuristics and Automatic Heuristic Design (AHD) by other LLMs-based methods, and achieves the signifi- cant improvements and the state-of-the-art per- formance of our proposed method in automating heuristic optimization with LLMs to solve COPs.