🤖 AI Summary
This work addresses the limited long-horizon planning capability of existing neural policies for vehicle routing problems, which typically predict only the next node and thus suffer from myopic decision-making. To overcome this, the authors propose a Multi-node Lookahead Prediction (MnLP) training strategy that incorporates a causal, detachable multi-step prediction module during training to enhance the model’s understanding of long-range context—without incurring additional inference overhead. By integrating auxiliary supervision losses at multiple depths, MnLP is compatible with diverse network architectures and consistently improves generalization across varying problem scales, data distributions, and real-world benchmarks, outperforming current training paradigms.
📝 Abstract
Neural policies have shown promise in solving vehicle routing problems due to their reduced reliance on handcrafted heuristics. However, current training paradigms suffer from a fundamental limitation: they primarily focus on next-node prediction for solution construction, resulting in myopic decision-making that undermines long-horizon planning capacity. To this end, we introduce Multi-node Lookahead Prediction (MnLP), a novel training strategy that extends the supervised learning paradigm to predict multiple future nodes simultaneously. We incorporate causal and discardable MnLP modules that operate exclusively during training, facilitating models to anticipate multi-step decisions while preserving inference-time efficiency. By incorporating multi-depth auxiliary supervision into the loss function, MnLP equips neural policies with the ability of long-range contextual understanding. Experimentally, MnLP outperforms existing training methods, improving the generalization capability of neural policies across various problem sizes, distributions, and real-world benchmarks. Moreover, MnLP can be seamlessly integrated into diverse neural architectures without introducing additional inference overhead.