MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
Existing reinforcement learning–based multi-task neural combinatorial optimization methods suffer from difficulties in training large-scale decoders and poor generalization across diverse problem variants. To address these challenges for the multi-variant Vehicle Routing Problem (VRP), this paper proposes a generalized unified neural solver. Our key contributions are: (1) a novel multi-task learning framework, MTL-KD, leveraging knowledge distillation to mitigate inter-task gradient conflicts and enable stable training of massive decoders; and (2) a Randomized Reordering and Reconstruction (R3C) inference strategy that enhances adaptability to heterogeneous VRP variants. Extensive experiments on six seen and ten unseen VRP variants—scaling up to 1,000 nodes—demonstrate significant improvements over state-of-the-art baselines. The solver exhibits strong cross-task and cross-scale generalization, consistently outperforming prior methods on both uniformly distributed benchmarks and real-world road network instances.

Technology Category

Application Category

📝 Abstract
Multi-Task Learning (MTL) in Neural Combinatorial Optimization (NCO) is a promising approach to train a unified model capable of solving multiple Vehicle Routing Problem (VRP) variants. However, existing Reinforcement Learning (RL)-based multi-task methods can only train light decoder models on small-scale problems, exhibiting limited generalization ability when solving large-scale problems. To overcome this limitation, this work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD), which enables the efficient training of heavy decoder models with strong generalization ability. The proposed MTL-KD method transfers policy knowledge from multiple distinct RL-based single-task models to a single heavy decoder model, facilitating label-free training and effectively improving the model's generalization ability across diverse tasks. In addition, we introduce a flexible inference strategy termed Random Reordering Re-Construction (R3C), which is specifically adapted for diverse VRP tasks and further boosts the performance of the multi-task model. Experimental results on 6 seen and 10 unseen VRP variants with up to 1000 nodes indicate that our proposed method consistently achieves superior performance on both uniform and real-world benchmarks, demonstrating robust generalization abilities.
Problem

Research questions and friction points this paper is trying to address.

Enhance generalization in multi-task vehicle routing problems
Train heavy decoder models via knowledge distillation
Improve performance across diverse VRP variants
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge distillation for multi-task learning
Heavy decoder model with strong generalization
Random Reordering Re-Construction inference strategy
🔎 Similar Papers
No similar papers found.