🤖 AI Summary
This work addresses the challenge of solving large-scale Vehicle Routing Problems (VRP), which are notoriously difficult due to their combinatorial complexity and the limited generalization capability of traditional heuristics. The authors propose a multi-agent graph search framework that models the search process as a dynamically constructed partial search graph, guided collaboratively by three specialized agents responsible for node selection, move selection, and strategic jumping. By decoupling problem-agnostic search control from domain-specific encodings, the framework significantly enhances cross-instance adaptability and exploration efficiency. Evaluated on standard VRPTW benchmarks, the method achieves a new state-of-the-art among learning-based approaches, reducing the optimality gap by 14%–44% compared to POMO and by 21%–40% relative to ALNS.
📝 Abstract
Although Vehicle Routing Problems (VRP) are essential to many real-world systems, they remain computationally intractable at scale due to their combinatorial complexity. Traditional heuristics rely on handcrafted rules for local improvements and occasional \textit{jumps} to escape local minima, but often struggle to generalize across diverse instances. We introduce \textbf{COAgents}, a cooperative multi-agent framework that models the search process as a graph: nodes represent solutions, and edges correspond to either local refinements or large perturbations for diversification (i.e., jumps). A \textit{Partial Search Graph} (PSG) is dynamically constructed during search, enabling COAgents to train a Node Selection Agent and a Move Selection Agent to guide intensification, and a Jump Agent to trigger well-timed explorations of new regions. Unlike end-to-end learning approaches, COAgents cleanly separates problem-agnostic search control from compact domain-specific encoding, facilitating adaptability across tasks. Extensive experiments on the CVRP and VRPTW benchmarks show that COAgents remains competitive with several learn-to-search baselines on CVRP and sets a new state of the art among learning-based methods on the more challenging VRPTW instances, reducing the gap to the best-known solutions by 14\% at $N\!=\!100$ and 44\% at $N\!=\!50$ relative to the strongest neural solver (POMO), and by 21\% and 40\% respectively relative to ALNS.
Code is available at https://github.com/mahdims/COAgents.