Multi-armed Bandit and Backbone boost Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman Problems

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

LKH-type algorithms for the Traveling Salesman Problem (TSP) and its variants suffer from premature convergence to local optima and ineffective exploitation of historical search information via the α-parameter. Method: We propose an adaptive edge evaluation framework integrating dynamic backbone edge identification with Multi-Armed Bandits (MAB). It jointly models backbone edges, α-values, and distance information as learnable MAB “arms”, enabling online, dynamic selection and optimization of path evaluation metrics; it further introduces, for the first time in LKH, an iteratively updated dynamic backbone structure. Contribution/Results: The method is compatible with LKH and LKH-3 local search and supports diverse problem formulations—including TSP, Capacitated Vehicle Routing Problem with Time Windows (CVRPTW), and Colored TSP. Experiments demonstrate substantial improvements in solution quality on standard TSP benchmarks and significant performance gains over LKH-3 on CVRPTW and Colored TSP, confirming strong generalization and robustness.

Technology Category

Application Category

📝 Abstract

The Lin-Kernighan-Helsguan (LKH) heuristic is a classic local search algorithm for the Traveling Salesman Problem (TSP). LKH introduces an $alpha$-value to replace the traditional distance metric for evaluating the edge quality, which leads to a significant improvement. However, we observe that the $alpha$-value does not make full use of the historical information during the search, and single guiding information often makes LKH hard to escape from some local optima. To address the above issues, we propose a novel way to extract backbone information during the TSP local search process, which is dynamic and can be updated once a local optimal solution is found. We further propose to combine backbone information, $alpha$-value, and distance to evaluate the edge quality so as to guide the search. Moreover, we abstract their different combinations to arms in a multi-armed bandit (MAB) and use an MAB model to help the algorithm select an appropriate evaluation metric dynamically. Both the backbone information and MAB can provide diverse guiding information and learn from the search history to suggest the best metric. We apply our methods to LKH and LKH-3, which is an extension version of LKH that can be used to solve about 40 variant problems of TSP and Vehicle Routing Problem (VRP). Extensive experiments show the excellent performance and generalization capability of our proposed method, significantly improving LKH for TSP and LKH-3 for two representative TSP and VRP variants, the Colored TSP (CTSP) and Capacitated VRP with Time Windows (CVRPTW).

Problem

Research questions and friction points this paper is trying to address.

LKH Algorithm

Traveling Salesman Problem

Multi-criteria Evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Armed Bandit Strategies

Dynamic Hints Integration

Enhanced LKH Algorithm

🔎 Similar Papers

Deep Reinforcement Learning for Traveling Purchaser Problems