🤖 AI Summary
This work addresses the challenge of solving capacitated vehicle routing problems (CVRP) under multiple objectives and complex constraints—such as time windows and backhaul requirements—within a unified framework. The authors propose a domain-knowledge-infused reinforcement learning approach that explicitly incorporates the “route-first, cluster-second” heuristic into the learning process. The method decomposes the problem into two stages: an initial route is constructed by a reinforcement learning agent in the first stage, followed by an exact clustering step via dynamic programming in the second stage to generate high-quality feedback signals. To mitigate partial observability induced by this decomposition, a history-augmented context module is introduced. Experiments demonstrate that the proposed method significantly outperforms existing learning-based algorithms across various CVRP variants, achieving solution quality comparable to classical heuristics while exhibiting strong generalization capabilities.
📝 Abstract
The Capacitated Vehicle Routing Problem (CVRP) is a fundamental NP-hard problem with broad applications in logistics and transportation. Real-world CVRPs often involve diverse objectives and complex constraints, such as time windows or backhaul requirements, motivating the development of a unified solution framework. Recent reinforcement learning (RL) approaches have shown promise in combinatorial optimization, yet they rely on end-to-end learning and lack explicit problem-solving knowledge, limiting solution quality. In this paper, we propose a knowledge-embedded framework inspired by the Route-First Cluster-Second heuristics. It incorporates knowledge at two levels: (1) decomposing CVRPs into the route-first and cluster-second subproblems, and (2) leveraging dynamic programming to solve the second subproblem, whose results guide the RL-based constructive solver to solve the first problem. To mitigate partial observability caused by problem decomposition, we introduce a unified history-enhanced context processing module. Extensive experiments show that this framework achieves superior solution quality compared with state-of-the-art learning-based methods, with a smaller gap to classical heuristics, demonstrating strong generalization across diverse CVRP variants.