Solving the Min-Max Multiple Traveling Salesmen Problem via Learning-Based Path Generation and Optimal Splitting

πŸ“… 2025-08-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the Min-Max Multiple Traveling Salesmen Problem (mΒ³-TSP)β€”an NP-hard combinatorial optimization problem that minimizes the length of the longest tour among (m) salesmen. To overcome the poor generalization and unreliable solution quality of existing learning-based approaches, we propose the Generate-and-Split (GaS) framework. In the first stage, an LSTM-enhanced reinforcement learning model generates an ordered city sequence; in the second stage, a differentiable, near-linear optimal segmentation algorithm jointly partitions the sequence into (m) balanced tours. GaS is the first end-to-end trainable paradigm that co-optimizes tour generation and segmentation, significantly improving both solution quality and cross-scale generalization. Experiments on multiple benchmarks demonstrate that GaS consistently outperforms state-of-the-art learning-based methods, with particularly pronounced gains on large-scale instances.

Technology Category

Application Category

πŸ“ Abstract
This study addresses the Min-Max Multiple Traveling Salesmen Problem ($m^3$-TSP), which aims to coordinate tours for multiple salesmen such that the length of the longest tour is minimized. Due to its NP-hard nature, exact solvers become impractical under the assumption that $P e NP$. As a result, learning-based approaches have gained traction for their ability to rapidly generate high-quality approximate solutions. Among these, two-stage methods combine learning-based components with classical solvers, simplifying the learning objective. However, this decoupling often disrupts consistent optimization, potentially degrading solution quality. To address this issue, we propose a novel two-stage framework named extbf{Generate-and-Split} (GaS), which integrates reinforcement learning (RL) with an optimal splitting algorithm in a joint training process. The splitting algorithm offers near-linear scalability with respect to the number of cities and guarantees optimal splitting in Euclidean space for any given path. To facilitate the joint optimization of the RL component with the algorithm, we adopt an LSTM-enhanced model architecture to address partial observability. Extensive experiments show that the proposed GaS framework significantly outperforms existing learning-based approaches in both solution quality and transferability.
Problem

Research questions and friction points this paper is trying to address.

Minimizing longest tour length for multiple salesmen
Addressing NP-hard complexity with learning methods
Integrating reinforcement learning with optimal splitting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for path generation
Optimal splitting algorithm integration
LSTM-enhanced model for observability
πŸ”Ž Similar Papers
No similar papers found.
W
Wen Wang
Nanjing University
X
Xiangchen Wu
Nanjing University
L
Liang Wang
Nanjing University
H
Hao Hu
Nanjing University
Xianping Tao
Xianping Tao
Professor of Computer Science, Nanjing University
Software EngineeringPervasive Computing
L
Linghao Zhang
State Grid Sichuan Electric Power Research Institute