๐ค AI Summary
This paper addresses the Capacitated Location-Routing Problem (CLRP) and its open variant (OCLRP), which jointly optimize facility location and vehicle routing under stringent capacity constraints, inducing strong decision interdependence. To tackle this challenge, we formulate CLRP as a multi-stage Markov Decision Process (MDP) for the first time and propose an end-to-end deep reinforcement learning (DRL) framework. Built upon an encoder-decoder architecture, our method introduces a heterogeneous query attention mechanism that dynamically adapts to the semantic requirements of distinct decision stagesโlocation selection and route construction. This design unifies the modeling of location-routing coordination and enables scalable DRL-based optimization. Extensive experiments on synthetic and benchmark datasets demonstrate that our approach significantly outperforms classical heuristics and state-of-the-art DRL baselines in both solution quality and generalization capability.
๐ Abstract
The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making location and routing decisions. In CLRPs, the complex constraints and the intricate relationships between various decisions make the problem challenging to solve. With the emergence of deep reinforcement learning (DRL), it has been extensively applied to address the vehicle routing problem and its variants, while the research related to CLRPs still needs to be explored. In this paper, we propose the DRL with heterogeneous query (DRLHQ) to solve CLRP and open CLRP (OCLRP), respectively. We are the first to propose an end-to-end learning approach for CLRPs, following the encoder-decoder structure. In particular, we reformulate the CLRPs as a markov decision process tailored to various decisions, a general modeling framework that can be adapted to other DRL-based methods. To better handle the interdependency across location and routing decisions, we also introduce a novel heterogeneous querying attention mechanism designed to adapt dynamically to various decision-making stages. Experimental results on both synthetic and benchmark datasets demonstrate superior solution quality and better generalization performance of our proposed approach over representative traditional and DRL-based baselines in solving both CLRP and OCLRP.