ViTSP: A Vision Language Models Guided Framework for Large-Scale Traveling Salesman Problems

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Exact algorithms for large-scale Traveling Salesman Problems (TSP) suffer from poor scalability, while heuristic methods rely heavily on manual parameter tuning and learning-based approaches exhibit limited generalization and scalability. Method: We propose ViTSP—the first method to leverage pre-trained vision-language models (VLMs) for combinatorial optimization. ViTSP visualizes TSP instances as images, employs VLMs to identify high-potential subgraphs, and delegates exact optimization of these subgraphs to commercial solvers (e.g., Concorde), iteratively refining the global solution through solver-VLM co-optimization. Crucially, ViTSP requires no fine-tuning or user-side training, eliminating dependence on fixed training distributions. Contribution/Results: ViTSP achieves strong zero-shot generalization across diverse TSP sizes (1k–88k nodes) and node distributions. Experiments show an average optimality gap <0.2%; on instances with ≥10k nodes, it reduces the performance gap relative to LKH-3 by 12%–100%, significantly outperforming existing learning-based methods. ViTSP establishes a novel paradigm integrating generative AI with classical OR solvers.

Technology Category

Application Category

📝 Abstract

Solving Traveling Salesman Problem (TSP) is NP-hard yet fundamental for wide real-world applications. Classical exact methods face challenges in scaling, and heuristic methods often require domain-specific parameter calibration. While learning-based approaches have shown promise, they suffer from poor generalization and limited scalability due to fixed training data. This work proposes ViTSP, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs. The VLMs function to identify promising small-scale subproblems from a visualized TSP instance, which are then efficiently optimized using an off-the-shelf solver to improve the global solution. ViTSP bypasses the dedicated model training at the user end while maintaining effectiveness across diverse instances. Experiments on real-world TSP instances ranging from 1k to 88k nodes demonstrate that ViTSP consistently achieves solutions with average optimality gaps below 0.2%, outperforming existing learning-based methods. Under the same runtime budget, it surpasses the best-performing heuristic solver, LKH-3, by reducing its gaps by 12% to 100%, particularly on very-large-scale instances with more than 10k nodes. Our framework offers a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems, with practical implications for integration into more complex logistics systems. The code is available at https://anonymous.4open.science/r/ViTSP_codes-6683.

Problem

Research questions and friction points this paper is trying to address.

Leverages vision language models to solve large-scale traveling salesman problems

Identifies promising subproblems visually to improve global solution optimization

Bypasses dedicated training while maintaining effectiveness across diverse instances

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vision language models to identify subproblems

Optimizes subproblems with off-the-shelf solvers

Bypasses dedicated training while maintaining effectiveness

🔎 Similar Papers

No similar papers found.