🤖 AI Summary
To address the limitation of LKH-family algorithms in solving the Traveling Salesman Problem (TSP)—namely, their reliance on static candidate edge sets that hinder escape from local optima—this paper introduces, for the first time, a Multi-Armed Bandit (MAB)-based mechanism for dynamic candidate edge selection. Specifically, it proposes an online, adaptive, probabilistic edge update strategy guided by the Upper Confidence Bound (UCB) principle to evaluate and select edges in real time. This mechanism is seamlessly integrated into both LKH and LKH-3 frameworks, departing from conventional static candidate set paradigms. Extensive experiments on standard benchmarks (e.g., TSPLIB) demonstrate significant improvements over baseline LKH; substantial performance gains are also observed on TSP variants—including Probabilistic TSP (PTSP) and Generalized TSP (GTSP)—validating the method’s generalizability and robustness. The core contribution lies in pioneering MAB-driven dynamic candidate edge selection, establishing a transferable, adaptive decision-making paradigm for metaheuristic local search.
📝 Abstract
Algorithms designed for routing problems typically rely on high-quality candidate edges to guide their search, aiming to reduce the search space and enhance the search efficiency. However, many existing algorithms, like the classical Lin-Kernighan-Helsgaun (LKH) algorithm for the Traveling Salesman Problem (TSP), often use predetermined candidate edges that remain static throughout local searches. This rigidity could cause the algorithm to get trapped in local optima, limiting its potential to find better solutions. To address this issue, we propose expanding the candidate sets to include other promising edges, providing them an opportunity for selection. Specifically, we incorporate multi-armed bandit models to dynamically select the most suitable candidate edges in each iteration, enabling LKH to make smarter choices and lead to improved solutions. Extensive experiments on multiple TSP benchmarks show the excellent performance of our method. Moreover, we employ this bandit-based method to LKH-3, an extension of LKH tailored for solving various TSP variant problems, and our method also significantly enhances LKH-3's performance across typical TSP variants.