ViTSP: A Vision Language Models Guided Framework for Large-Scale Traveling Salesman Problems

πŸ“… 2025-09-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Exact algorithms for large-scale Traveling Salesman Problems (TSP) suffer from poor scalability, while heuristic methods rely heavily on manual parameter tuning and learning-based approaches exhibit limited generalization and scalability. Method: We propose ViTSPβ€”the first method to leverage pre-trained vision-language models (VLMs) for combinatorial optimization. ViTSP visualizes TSP instances as images, employs VLMs to identify high-potential subgraphs, and delegates exact optimization of these subgraphs to commercial solvers (e.g., Concorde), iteratively refining the global solution through solver-VLM co-optimization. Crucially, ViTSP requires no fine-tuning or user-side training, eliminating dependence on fixed training distributions. Contribution/Results: ViTSP achieves strong zero-shot generalization across diverse TSP sizes (1k–88k nodes) and node distributions. Experiments show an average optimality gap <0.2%; on instances with β‰₯10k nodes, it reduces the performance gap relative to LKH-3 by 12%–100%, significantly outperforming existing learning-based methods. ViTSP establishes a novel paradigm integrating generative AI with classical OR solvers.

Technology Category

Application Category

πŸ“ Abstract
Solving Traveling Salesman Problem (TSP) is NP-hard yet fundamental for wide real-world applications. Classical exact methods face challenges in scaling, and heuristic methods often require domain-specific parameter calibration. While learning-based approaches have shown promise, they suffer from poor generalization and limited scalability due to fixed training data. This work proposes ViTSP, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs. The VLMs function to identify promising small-scale subproblems from a visualized TSP instance, which are then efficiently optimized using an off-the-shelf solver to improve the global solution. ViTSP bypasses the dedicated model training at the user end while maintaining effectiveness across diverse instances. Experiments on real-world TSP instances ranging from 1k to 88k nodes demonstrate that ViTSP consistently achieves solutions with average optimality gaps below 0.2%, outperforming existing learning-based methods. Under the same runtime budget, it surpasses the best-performing heuristic solver, LKH-3, by reducing its gaps by 12% to 100%, particularly on very-large-scale instances with more than 10k nodes. Our framework offers a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems, with practical implications for integration into more complex logistics systems. The code is available at https://anonymous.4open.science/r/ViTSP_codes-6683.
Problem

Research questions and friction points this paper is trying to address.

Leverages vision language models to solve large-scale traveling salesman problems
Identifies promising subproblems visually to improve global solution optimization
Bypasses dedicated training while maintaining effectiveness across diverse instances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vision language models to identify subproblems
Optimizes subproblems with off-the-shelf solvers
Bypasses dedicated training while maintaining effectiveness
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhuoli Yin
Edwardson School of Industrial Engineering, Purdue University, West Lafayette, USA
Y
Yi Ding
Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, USA
R
Reem Khir
Edwardson School of Industrial Engineering, Purdue University, West Lafayette, USA
Hua Cai
Hua Cai
Thomas and Jane Schmidt Rising Star Associate Professor, Purdue University
Shared MobilitySustainable SystemsAI for SustainabilityEnvironmental & Ecological Engineering