SED2AM: Solving Multi-Trip Time-Dependent Vehicle Routing Problem using Deep Reinforcement Learning

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the Multi-Trip Time-Dependent Vehicle Routing Problem with maximum working-hour constraints (MTTDVRP) in urban logistics. Methodologically, it proposes the first end-to-end deep reinforcement learning (DRL) framework, featuring a graph encoder with temporal locality inductive bias, a dual-decoder architecture (for vehicle selection and trip generation), and a dual-representation mechanism capturing both fleet state and route state. The framework employs a Transformer-based policy network trained via Proximal Policy Optimization (PPO), integrating time-varying graph modeling and dual-decoder attention mechanisms. Evaluated on two large-scale real-world datasets from Canadian cities, the method substantially outperforms existing DRL and metaheuristic approaches, demonstrating strong generalization capability and scalability to large instances. It establishes a novel, scalable paradigm for time-dependent routing optimization.

Technology Category

Application Category

📝 Abstract
Deep reinforcement learning (DRL)-based frameworks, featuring Transformer-style policy networks, have demonstrated their efficacy across various vehicle routing problem (VRP) variants. However, the application of these methods to the multi-trip time-dependent vehicle routing problem (MTTDVRP) with maximum working hours constraints -- a pivotal element of urban logistics -- remains largely unexplored. This paper introduces a DRL-based method called the Simultaneous Encoder and Dual Decoder Attention Model (SED2AM), tailored for the MTTDVRP with maximum working hours constraints. The proposed method introduces a temporal locality inductive bias to the encoding module of the policy networks, enabling it to effectively account for the time-dependency in travel distance or time. The decoding module of SED2AM includes a vehicle selection decoder that selects a vehicle from the fleet, effectively associating trips with vehicles for functional multi-trip routing. Additionally, this decoding module is equipped with a trip construction decoder leveraged for constructing trips for the vehicles. This policy model is equipped with two classes of state representations, fleet state and routing state, providing the information needed for effective route construction in the presence of maximum working hours constraints. Experimental results using real-world datasets from two major Canadian cities not only show that SED2AM outperforms the current state-of-the-art DRL-based and metaheuristic-based baselines but also demonstrate its generalizability to solve larger-scale problems.
Problem

Research questions and friction points this paper is trying to address.

Addresses multi-trip time-dependent vehicle routing with working hour constraints.
Introduces SED2AM, a deep reinforcement learning model for urban logistics.
Demonstrates superior performance and scalability on real-world datasets.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-style policy networks for MTTDVRP
Temporal locality inductive bias in encoding
Dual decoder for vehicle and trip selection
🔎 Similar Papers
No similar papers found.