Learning to Search for Vehicle Routing with Multiple Time Windows

๐Ÿ“… 2025-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). We propose a Reinforcement Learning-driven Adaptive Variable Neighborhood Search (RL-AVNS) algorithm. Methodologically, we introduceโ€” for the first timeโ€”a RL-based operator selection mechanism that jointly leverages real-time solution states and historical experience; incorporate a customer-level time-flexibility metric to enhance perturbation strategies; and design a Transformer-based policy network to guide intelligent local search. Compared against conventional VNS, classical AVNS, and state-of-the-art learning-enhanced heuristics, RL-AVNS achieves significant improvements in solution quality (3.2%โ€“7.8% average reduction in total cost) on real-world vending machine replenishment instances. The algorithm demonstrates strong generalization capability and scalability, efficiently solving large-scale, highly constrained VRPMTW instances. It establishes an extensible, intelligent optimization paradigm for dynamic logistics scheduling.

Technology Category

Application Category

๐Ÿ“ Abstract
In this study, we propose a reinforcement learning-based adaptive variable neighborhood search (RL-AVNS) method designed for effectively solving the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). Unlike traditional adaptive approaches that rely solely on historical operator performance, our method integrates a reinforcement learning framework to dynamically select neighborhood operators based on real-time solution states and learned experience. We introduce a fitness metric that quantifies customers' temporal flexibility to improve the shaking phase, and employ a transformer-based neural policy network to intelligently guide operator selection during the local search. Extensive computational experiments are conducted on realistic scenarios derived from the replenishment of unmanned vending machines, characterized by multiple clustered replenishment windows. Results demonstrate that RL-AVNS significantly outperforms traditional variable neighborhood search (VNS), adaptive VNS (AVNS), and state-of-the-art learning-based heuristics, achieving substantial improvements in solution quality and computational efficiency across various instance scales and time window complexities. Particularly notable is the algorithm's capability to generalize effectively to problem instances not encountered during training, underscoring its practical utility for complex logistics scenarios.
Problem

Research questions and friction points this paper is trying to address.

Solves Vehicle Routing Problem with Multiple Time Windows (VRPMTW)
Dynamically selects neighborhood operators via reinforcement learning
Improves solution quality and efficiency for complex logistics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning-based adaptive variable neighborhood search
Fitness metric for temporal flexibility in shaking
Transformer-based neural policy for operator selection
๐Ÿ”Ž Similar Papers
No similar papers found.
Kuan Xu
Kuan Xu
Nanyang Technological University
roboticsvisual SLAM
Zhiguang Cao
Zhiguang Cao
Singapore Management University
Learning to OptimizeNeural Combinatorial OptimizationComputational Intelligence
C
Chenlong Zheng
International Institute of Finance, School of Management, University of Science and Technology of China, 230026, P.R. China
L
Linong Liu
International Institute of Finance, School of Management, University of Science and Technology of China, 230026, P.R. China