End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This paper addresses the Chance-constrained Vehicle Routing Problem with Time Windows (C-VRPTW) under stochastic travel times, where total travel time and route completion time—conflicting objectives—must be jointly optimized. We propose an end-to-end deep reinforcement learning framework featuring an attention-based policy network, integrated with multi-solution trajectory sampling and scenario clustering during training to explicitly model uncertainty and efficiently approximate the Pareto-optimal front. Our key contribution is the first incorporation of scenario clustering into end-to-end multi-objective RL for routing, ensuring both workforce compliance (via chance constraints) and operational efficiency. Experiments demonstrate that our method significantly outperforms three baseline approaches within acceptable computational time, achieving substantial improvements in solution-set quality (hypervolume) and diversity. The framework provides a scalable, automated decision-making solution for real-world logistics dispatch.

Technology Category

Application Category

📝 Abstract

In this work, we consider learning-based applications in routing to solve a Vehicle Routing variant characterized by stochasticity and multiple objectives. Such problems are representative of practical settings where decision-makers have to deal with uncertainty in the operational environment as well as multiple conflicting objectives due to different stakeholders. We specifically consider travel time uncertainty. We also consider two objectives, total travel time and route makespan, that jointly target operational efficiency and labor regulations on shift length, although different objectives could be incorporated. Learning-based methods offer earnest computational advantages as they can repeatedly solve problems with limited interference from the decision-maker. We specifically focus on end-to-end deep learning models that leverage the attention mechanism and multiple solution trajectories. These models have seen several successful applications in routing problems. However, since travel times are not a direct input to these models due to the large dimensions of the travel time matrix, accounting for uncertainty is a challenge, especially in the presence of multiple objectives. In turn, we propose a model that simultaneously addresses stochasticity and multi-objectivity and provide a refined training mechanism for this model through scenario clustering to reduce training time. Our results show that our model is capable of constructing a Pareto Front of good quality within acceptable run times compared to three baselines.

Problem

Research questions and friction points this paper is trying to address.

Addresses stochastic multi-objective optimization in vehicle routing

Handles travel time uncertainty and conflicting objectives like efficiency and regulations

Proposes an end-to-end deep learning model with scenario clustering for training

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end deep reinforcement learning for routing

Attention mechanism with multiple solution trajectories

Scenario clustering for efficient training mechanism

🔎 Similar Papers

No similar papers found.