🤖 AI Summary
To address the weak generalization capability of base models in multi-task, multi-distribution vehicle routing problems (MTMDVRP) and their difficulty adapting to heterogeneous customer distributions in real-world scenarios, this paper proposes the first unified solution framework. Methodologically, it innovatively integrates sparse computation (Mixture-of-Depths) with context-aware hierarchical clustering to establish a dual inductive bias mechanism, enabling adaptive representation learning across tasks and distributions; it further employs a deep decoder architecture to dynamically allocate computational resources and model spatial hierarchical structures. Extensive experiments across nine real-world maps and 144 VRP variants demonstrate that our approach significantly outperforms existing state-of-the-art methods, achieving substantial gains in generalization to unseen tasks and unknown distributions. This work establishes a scalable, robust, and general-purpose paradigm for complex real-world routing optimization.
📝 Abstract
Recent advances toward foundation models for routing problems have shown great potential of a unified deep model for various VRP variants. However, they overlook the complex real-world customer distributions. In this work, we advance the Multi-Task VRP (MTVRP) setting to the more realistic yet challenging Multi-Task Multi-Distribution VRP (MTMDVRP) setting, and introduce SHIELD, a novel model that leverages both sparsity and hierarchy principles. Building on a deeper decoder architecture, we first incorporate the Mixture-of-Depths (MoD) technique to enforce sparsity. This improves both efficiency and generalization by allowing the model to dynamically select nodes to use or skip each decoder layer, providing the needed capacity to adaptively allocate computation for learning the task/distribution specific and shared representations. We also develop a context-based clustering layer that exploits the presence of hierarchical structures in the problems to produce better local representations. These two designs inductively bias the network to identify key features that are common across tasks and distributions, leading to significantly improved generalization on unseen ones. Our empirical results demonstrate the superiority of our approach over existing methods on 9 real-world maps with 16 VRP variants each.