π€ AI Summary
This paper addresses the Min-Max Multiple Traveling Salesmen Problem (mΒ³-TSP)βan NP-hard combinatorial optimization problem that minimizes the length of the longest tour among (m) salesmen. To overcome the poor generalization and unreliable solution quality of existing learning-based approaches, we propose the Generate-and-Split (GaS) framework. In the first stage, an LSTM-enhanced reinforcement learning model generates an ordered city sequence; in the second stage, a differentiable, near-linear optimal segmentation algorithm jointly partitions the sequence into (m) balanced tours. GaS is the first end-to-end trainable paradigm that co-optimizes tour generation and segmentation, significantly improving both solution quality and cross-scale generalization. Experiments on multiple benchmarks demonstrate that GaS consistently outperforms state-of-the-art learning-based methods, with particularly pronounced gains on large-scale instances.
π Abstract
This study addresses the Min-Max Multiple Traveling Salesmen Problem ($m^3$-TSP), which aims to coordinate tours for multiple salesmen such that the length of the longest tour is minimized. Due to its NP-hard nature, exact solvers become impractical under the assumption that $P
e NP$. As a result, learning-based approaches have gained traction for their ability to rapidly generate high-quality approximate solutions. Among these, two-stage methods combine learning-based components with classical solvers, simplifying the learning objective. However, this decoupling often disrupts consistent optimization, potentially degrading solution quality. To address this issue, we propose a novel two-stage framework named extbf{Generate-and-Split} (GaS), which integrates reinforcement learning (RL) with an optimal splitting algorithm in a joint training process. The splitting algorithm offers near-linear scalability with respect to the number of cities and guarantees optimal splitting in Euclidean space for any given path. To facilitate the joint optimization of the RL component with the algorithm, we adopt an LSTM-enhanced model architecture to address partial observability. Extensive experiments show that the proposed GaS framework significantly outperforms existing learning-based approaches in both solution quality and transferability.