Planning of Heuristics: Strategic Planning on Large Language Models with Monte Carlo Tree Search for Automating Heuristic Optimization

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual design of heuristics for combinatorial optimization problems (COPs) heavily relies on domain expertise and is time-consuming. Method: This paper proposes a novel framework that synergistically integrates large language models (LLMs) with Monte Carlo tree search (MCTS) for automated heuristic optimization. LLMs leverage semantic generation and self-reflection to produce candidate heuristics, while MCTS performs reward-guided, sequential evaluation and iterative refinement over structured state spaces—establishing a closed-loop “generate–evaluate–revise” pipeline. Contribution/Results: To our knowledge, this is the first work to deeply fuse LLMs’ symbolic reasoning capabilities with MCTS’s interpretable decision-making mechanism, overcoming limitations of both manual design and black-box LLM-based approaches. On benchmark tasks—including the Traveling Salesman Problem (TSP) and Flow Shop Scheduling Problem (FSSP)—our method significantly outperforms handcrafted heuristics and existing LLM-driven automatic design methods, achieving state-of-the-art performance in heuristic automation for COPs.

Technology Category

Application Category

📝 Abstract
Heuristics have achieved great success in solv- ing combinatorial optimization problems (COPs). However, heuristics designed by humans re- quire too much domain knowledge and testing time. Given the fact that Large Language Mod- els (LLMs) possess strong capabilities to under- stand and generate content, and a knowledge base that covers various domains, which offer a novel way to automatically optimize heuristics. There- fore, we propose Planning of Heuristics (PoH), an optimization method that integrates the self- reflection of LLMs with the Monte Carlo Tree Search (MCTS), a well-known planning algo- rithm. PoH iteratively refines generated heuristics by evaluating their performance and providing im- provement suggestions. Our method enables to it- eratively evaluate the generated heuristics (states) and improve them based on the improvement sug- gestions (actions) and evaluation results (rewards), by effectively simulating future states to search for paths with higher rewards. In this paper, we apply PoH to solve the Traveling Salesman Prob- lem (TSP) and the Flow Shop Scheduling Prob- lem (FSSP). The experimental results show that PoH outperforms other hand-crafted heuristics and Automatic Heuristic Design (AHD) by other LLMs-based methods, and achieves the signifi- cant improvements and the state-of-the-art per- formance of our proposed method in automating heuristic optimization with LLMs to solve COPs.
Problem

Research questions and friction points this paper is trying to address.

Automate heuristic optimization using LLMs
Integrate LLMs with Monte Carlo Tree Search
Solve combinatorial optimization problems effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs self-reflection integration
Monte Carlo Tree Search
automated heuristic optimization
🔎 Similar Papers
No similar papers found.
Chaoxu Mu
Chaoxu Mu
Tianjin University
Nonlinear system control and optimizationAdaptive and learning systemsand Smart grid
X
Xufeng Zhang
School of Artificial Intelligence, Anhui University, Hefei, Anhui, China
H
Hui Wang
School of Artificial Intelligence, Anhui University, Hefei, Anhui, China