Multi-armed Bandit and Backbone boost Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman Problems

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LKH-type algorithms for the Traveling Salesman Problem (TSP) and its variants suffer from premature convergence to local optima and ineffective exploitation of historical search information via the α-parameter. Method: We propose an adaptive edge evaluation framework integrating dynamic backbone edge identification with Multi-Armed Bandits (MAB). It jointly models backbone edges, α-values, and distance information as learnable MAB “arms”, enabling online, dynamic selection and optimization of path evaluation metrics; it further introduces, for the first time in LKH, an iteratively updated dynamic backbone structure. Contribution/Results: The method is compatible with LKH and LKH-3 local search and supports diverse problem formulations—including TSP, Capacitated Vehicle Routing Problem with Time Windows (CVRPTW), and Colored TSP. Experiments demonstrate substantial improvements in solution quality on standard TSP benchmarks and significant performance gains over LKH-3 on CVRPTW and Colored TSP, confirming strong generalization and robustness.

Technology Category

Application Category

📝 Abstract
The Lin-Kernighan-Helsguan (LKH) heuristic is a classic local search algorithm for the Traveling Salesman Problem (TSP). LKH introduces an $alpha$-value to replace the traditional distance metric for evaluating the edge quality, which leads to a significant improvement. However, we observe that the $alpha$-value does not make full use of the historical information during the search, and single guiding information often makes LKH hard to escape from some local optima. To address the above issues, we propose a novel way to extract backbone information during the TSP local search process, which is dynamic and can be updated once a local optimal solution is found. We further propose to combine backbone information, $alpha$-value, and distance to evaluate the edge quality so as to guide the search. Moreover, we abstract their different combinations to arms in a multi-armed bandit (MAB) and use an MAB model to help the algorithm select an appropriate evaluation metric dynamically. Both the backbone information and MAB can provide diverse guiding information and learn from the search history to suggest the best metric. We apply our methods to LKH and LKH-3, which is an extension version of LKH that can be used to solve about 40 variant problems of TSP and Vehicle Routing Problem (VRP). Extensive experiments show the excellent performance and generalization capability of our proposed method, significantly improving LKH for TSP and LKH-3 for two representative TSP and VRP variants, the Colored TSP (CTSP) and Capacitated VRP with Time Windows (CVRPTW).
Problem

Research questions and friction points this paper is trying to address.

LKH Algorithm
Traveling Salesman Problem
Multi-criteria Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Armed Bandit Strategies
Dynamic Hints Integration
Enhanced LKH Algorithm
🔎 Similar Papers
No similar papers found.
L
Long Wang
School of Computer Science and Technology, Huazhong University of Science and Technology, China 430074
Jiongzhi Zheng
Jiongzhi Zheng
Huazhong University of Science and Techonology
Combinatorial OptimizationReinforcement LearningMachine LearningArtificial Intelligence
Z
Zhengda Xiong
School of Computer Science and Technology, Huazhong University of Science and Technology, China 430074
K
Kun He
School of Computer Science and Technology, Huazhong University of Science and Technology, China 430074