π€ AI Summary
Exact algorithms for large-scale Traveling Salesman Problems (TSP) suffer from poor scalability, while heuristic methods rely heavily on manual parameter tuning and learning-based approaches exhibit limited generalization and scalability.
Method: We propose ViTSPβthe first method to leverage pre-trained vision-language models (VLMs) for combinatorial optimization. ViTSP visualizes TSP instances as images, employs VLMs to identify high-potential subgraphs, and delegates exact optimization of these subgraphs to commercial solvers (e.g., Concorde), iteratively refining the global solution through solver-VLM co-optimization. Crucially, ViTSP requires no fine-tuning or user-side training, eliminating dependence on fixed training distributions.
Contribution/Results: ViTSP achieves strong zero-shot generalization across diverse TSP sizes (1kβ88k nodes) and node distributions. Experiments show an average optimality gap <0.2%; on instances with β₯10k nodes, it reduces the performance gap relative to LKH-3 by 12%β100%, significantly outperforming existing learning-based methods. ViTSP establishes a novel paradigm integrating generative AI with classical OR solvers.
π Abstract
Solving Traveling Salesman Problem (TSP) is NP-hard yet fundamental for wide real-world applications. Classical exact methods face challenges in scaling, and heuristic methods often require domain-specific parameter calibration. While learning-based approaches have shown promise, they suffer from poor generalization and limited scalability due to fixed training data. This work proposes ViTSP, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs. The VLMs function to identify promising small-scale subproblems from a visualized TSP instance, which are then efficiently optimized using an off-the-shelf solver to improve the global solution. ViTSP bypasses the dedicated model training at the user end while maintaining effectiveness across diverse instances. Experiments on real-world TSP instances ranging from 1k to 88k nodes demonstrate that ViTSP consistently achieves solutions with average optimality gaps below 0.2%, outperforming existing learning-based methods. Under the same runtime budget, it surpasses the best-performing heuristic solver, LKH-3, by reducing its gaps by 12% to 100%, particularly on very-large-scale instances with more than 10k nodes. Our framework offers a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems, with practical implications for integration into more complex logistics systems. The code is available at https://anonymous.4open.science/r/ViTSP_codes-6683.