Solving the Min-Max Multiple Traveling Salesmen Problem via Learning-Based Path Generation and Optimal Splitting

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This paper addresses the Min-Max Multiple Traveling Salesmen Problem (m³-TSP)—an NP-hard combinatorial optimization problem that minimizes the length of the longest tour among (m) salesmen. To overcome the poor generalization and unreliable solution quality of existing learning-based approaches, we propose the Generate-and-Split (GaS) framework. In the first stage, an LSTM-enhanced reinforcement learning model generates an ordered city sequence; in the second stage, a differentiable, near-linear optimal segmentation algorithm jointly partitions the sequence into (m) balanced tours. GaS is the first end-to-end trainable paradigm that co-optimizes tour generation and segmentation, significantly improving both solution quality and cross-scale generalization. Experiments on multiple benchmarks demonstrate that GaS consistently outperforms state-of-the-art learning-based methods, with particularly pronounced gains on large-scale instances.

Technology Category

Application Category

📝 Abstract

This study addresses the Min-Max Multiple Traveling Salesmen Problem ($m^3$-TSP), which aims to coordinate tours for multiple salesmen such that the length of the longest tour is minimized. Due to its NP-hard nature, exact solvers become impractical under the assumption that $P e NP$. As a result, learning-based approaches have gained traction for their ability to rapidly generate high-quality approximate solutions. Among these, two-stage methods combine learning-based components with classical solvers, simplifying the learning objective. However, this decoupling often disrupts consistent optimization, potentially degrading solution quality. To address this issue, we propose a novel two-stage framework named extbf{Generate-and-Split} (GaS), which integrates reinforcement learning (RL) with an optimal splitting algorithm in a joint training process. The splitting algorithm offers near-linear scalability with respect to the number of cities and guarantees optimal splitting in Euclidean space for any given path. To facilitate the joint optimization of the RL component with the algorithm, we adopt an LSTM-enhanced model architecture to address partial observability. Extensive experiments show that the proposed GaS framework significantly outperforms existing learning-based approaches in both solution quality and transferability.

Problem

Research questions and friction points this paper is trying to address.

Minimizing longest tour length for multiple salesmen

Addressing NP-hard complexity with learning methods

Integrating reinforcement learning with optimal splitting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for path generation

Optimal splitting algorithm integration

LSTM-enhanced model for observability

🔎 Similar Papers

Deep Reinforcement Learning for Traveling Purchaser Problems