Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural solvers for multi-task Vehicle Routing Problems (VRPs) employ monolithic policies, overlooking the compositional structure inherent among VRP variants—where complex variants are built from fundamental ones—thus hindering effective transfer of specialized solver capabilities. Method: We propose the State-Decomposable Markov Decision Process (SD-MDP) framework, the first to explicitly model both shared structural priors and compositional relationships across VRP variants. We theoretically prove that an optimal unified policy can be recovered via latent-space mixture of specialized policies. Technically, we design low-rank adapter-based expert networks as task-specialized base solvers and integrate a learnable adaptive gating mechanism for dynamic policy mixing and state embedding mapping. Contribution/Results: Our approach achieves significant improvements in cross-task generalization, training efficiency, and solution quality across diverse VRP benchmarks, empirically validating the effectiveness of compositional generalization as a paradigm for neural VRP solving.

Technology Category

Application Category

📝 Abstract
Existing neural methods for multi-task vehicle routing problems (VRPs) typically learn unified solvers to handle multiple constraints simultaneously. However, they often underutilize the compositional structure of VRP variants, each derivable from a common set of basis VRP variants. This critical oversight causes unified solvers to miss out the potential benefits of basis solvers, each specialized for a basis VRP variant. To overcome this limitation, we propose a framework that enables unified solvers to perceive the shared-component nature across VRP variants by proactively reusing basis solvers, while mitigating the exponential growth of trained neural solvers. Specifically, we introduce a State-Decomposable MDP (SDMDP) that reformulates VRPs by expressing the state space as the Cartesian product of basis state spaces associated with basis VRP variants. More crucially, this formulation inherently yields the optimal basis policy for each basis VRP variant. Furthermore, a Latent Space-based SDMDP extension is developed by incorporating both the optimal basis policies and a learnable mixture function to enable the policy reuse in the latent space. Under mild assumptions, this extension provably recovers the optimal unified policy of SDMDP through the mixture function that computes the state embedding as a mapping from the basis state embeddings generated by optimal basis policies. For practical implementation, we introduce the Mixture-of-Specialized-Experts Solver (MoSES), which realizes basis policies through specialized Low-Rank Adaptation (LoRA) experts, and implements the mixture function via an adaptive gating mechanism. Extensive experiments conducted across VRP variants showcase the superiority of MoSES over prior methods.
Problem

Research questions and friction points this paper is trying to address.

Developing specialized neural solvers for multi-task vehicle routing problems
Overcoming limitations of unified solvers by reusing basis VRP policies
Mitigating exponential growth of trained neural solvers through decomposition
Innovation

Methods, ideas, or system contributions that make the work stand out.

State-Decomposable MDP reformulates VRP state space
Latent space mixture function reuses optimal basis policies
Mixture-of-Specialized-Experts Solver implements LoRA experts
🔎 Similar Papers
No similar papers found.
Y
Yuxin Pan
The Hong Kong University of Science and Technology
Zhiguang Cao
Zhiguang Cao
Singapore Management University
Learning to OptimizeNeural Combinatorial OptimizationComputational Intelligence
C
Chengyang Gu
The Hong Kong University of Science and Technology (Guangzhou)
L
Liu Liu
Tencent AI Lab
P
Peilin Zhao
Tencent AI Lab, Shanghai Jiao Tong University
Yize Chen
Yize Chen
Assistant Professor, University of Alberta
Machine LearningPower SystemsOptimizationControl
Fangzhen Lin
Fangzhen Lin
Unknown affiliation