🤖 AI Summary
This work addresses the limited out-of-distribution generalization of deep reinforcement learning (DRL) approaches to vehicle routing problems, which often stems from training on data drawn from a single distribution. To overcome this limitation, the authors propose a modular policy architecture comprising three key components: a Residual Refinement Expert (R2E) module to enhance model expressiveness, instance-level gating (IG) for distribution-aware routing decisions, and a dynamic weight adaptation (DWA) strategy that enables effective training across mixed data distributions. The proposed method achieves state-of-the-art performance on both in-distribution and out-of-distribution evaluations across synthetic and benchmark datasets. Furthermore, it integrates seamlessly into existing DRL frameworks, offering a practical and effective means to improve generalization without requiring architectural overhauls.
📝 Abstract
In recent years, Deep Reinforcement Learning (DRL) has achieved substantial progress on Vehicle Routing Problems (VRPs). However, existing DRL-based methods are typically trained on instances generated from a uniform distribution, which limits their performance under real-world distribution shifts. In this paper, we aim to develop a generalization-oriented model that partitions the policy network into multiple modules and adaptively recombines modules to form specific policies during inference. Specifically, we propose Residual Refined Experts with Instance-level Gating (R2E-IG) to improve cross-distribution generalization. Our contributions are threefold: (1) We introduce a Residual Refined Expert (R2E) architecture that enhance expert expressiveness via residual refinement; (2) We design an instance-level gating mechanism that learns distribution-aware instance representations and routes inputs to suitable modules; (3) We propose a mixed-distribution training mechanism equipped with Dynamic Weight Adaption (DWA), which dynamically reweights training data from different distributions to emphasize more informative ones. Extensive experiments show that R2E-IG achieves competitive performance against state-of-the-art baselines on both in-distribution and out-of-distribution instances across synthetic and benchmark datasets. Moreover, R2E-IG is generic and can be easily integrated into existing DRL-based methods to further improve performance.