🤖 AI Summary
Existing neural solvers for multi-task Vehicle Routing Problems (VRPs) employ monolithic policies, overlooking the compositional structure inherent among VRP variants—where complex variants are built from fundamental ones—thus hindering effective transfer of specialized solver capabilities.
Method: We propose the State-Decomposable Markov Decision Process (SD-MDP) framework, the first to explicitly model both shared structural priors and compositional relationships across VRP variants. We theoretically prove that an optimal unified policy can be recovered via latent-space mixture of specialized policies. Technically, we design low-rank adapter-based expert networks as task-specialized base solvers and integrate a learnable adaptive gating mechanism for dynamic policy mixing and state embedding mapping.
Contribution/Results: Our approach achieves significant improvements in cross-task generalization, training efficiency, and solution quality across diverse VRP benchmarks, empirically validating the effectiveness of compositional generalization as a paradigm for neural VRP solving.
📝 Abstract
Existing neural methods for multi-task vehicle routing problems (VRPs) typically learn unified solvers to handle multiple constraints simultaneously. However, they often underutilize the compositional structure of VRP variants, each derivable from a common set of basis VRP variants. This critical oversight causes unified solvers to miss out the potential benefits of basis solvers, each specialized for a basis VRP variant. To overcome this limitation, we propose a framework that enables unified solvers to perceive the shared-component nature across VRP variants by proactively reusing basis solvers, while mitigating the exponential growth of trained neural solvers. Specifically, we introduce a State-Decomposable MDP (SDMDP) that reformulates VRPs by expressing the state space as the Cartesian product of basis state spaces associated with basis VRP variants. More crucially, this formulation inherently yields the optimal basis policy for each basis VRP variant. Furthermore, a Latent Space-based SDMDP extension is developed by incorporating both the optimal basis policies and a learnable mixture function to enable the policy reuse in the latent space. Under mild assumptions, this extension provably recovers the optimal unified policy of SDMDP through the mixture function that computes the state embedding as a mapping from the basis state embeddings generated by optimal basis policies. For practical implementation, we introduce the Mixture-of-Specialized-Experts Solver (MoSES), which realizes basis policies through specialized Low-Rank Adaptation (LoRA) experts, and implements the mixture function via an adaptive gating mechanism. Extensive experiments conducted across VRP variants showcase the superiority of MoSES over prior methods.