A Unified Knowledge Embedded Reinforcement Learning-based Framework for Generalized Capacitated Vehicle Routing Problems

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the challenge of solving capacitated vehicle routing problems (CVRP) under multiple objectives and complex constraints—such as time windows and backhaul requirements—within a unified framework. The authors propose a domain-knowledge-infused reinforcement learning approach that explicitly incorporates the “route-first, cluster-second” heuristic into the learning process. The method decomposes the problem into two stages: an initial route is constructed by a reinforcement learning agent in the first stage, followed by an exact clustering step via dynamic programming in the second stage to generate high-quality feedback signals. To mitigate partial observability induced by this decomposition, a history-augmented context module is introduced. Experiments demonstrate that the proposed method significantly outperforms existing learning-based algorithms across various CVRP variants, achieving solution quality comparable to classical heuristics while exhibiting strong generalization capabilities.

📝 Abstract

The Capacitated Vehicle Routing Problem (CVRP) is a fundamental NP-hard problem with broad applications in logistics and transportation. Real-world CVRPs often involve diverse objectives and complex constraints, such as time windows or backhaul requirements, motivating the development of a unified solution framework. Recent reinforcement learning (RL) approaches have shown promise in combinatorial optimization, yet they rely on end-to-end learning and lack explicit problem-solving knowledge, limiting solution quality. In this paper, we propose a knowledge-embedded framework inspired by the Route-First Cluster-Second heuristics. It incorporates knowledge at two levels: (1) decomposing CVRPs into the route-first and cluster-second subproblems, and (2) leveraging dynamic programming to solve the second subproblem, whose results guide the RL-based constructive solver to solve the first problem. To mitigate partial observability caused by problem decomposition, we introduce a unified history-enhanced context processing module. Extensive experiments show that this framework achieves superior solution quality compared with state-of-the-art learning-based methods, with a smaller gap to classical heuristics, demonstrating strong generalization across diverse CVRP variants.

Problem

Research questions and friction points this paper is trying to address.

Capacitated Vehicle Routing Problem

Reinforcement Learning

Combinatorial Optimization

Problem Decomposition

Generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge-embedded reinforcement learning

Route-First Cluster-Second

dynamic programming