Deep Reinforcement Learning for Traveling Purchaser Problems

๐Ÿ“… 2024-04-03
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
The Traveling Procurement Problem (TPP) suffers from strong coupling between routing and procurement decisions, making it challenging for existing methods to balance solution accuracy and computational efficiency. This paper proposes a decoupled deep reinforcement learning (DRL) framework: first, modeling TPP as a bipartite graph and jointly optimizing the global route via DRL and bipartite graph neural networks; second, generating procurement plans efficiently using linear programming. We introduce the first decoupled optimization paradigm for TPP, augmented with a meta-learning strategy enabling generalization across problem scales and demand distributionsโ€”even to instances significantly larger than those seen during training. Evaluated on synthetic benchmarks and the TPPLIB dataset, our approach reduces optimality gaps by 40%โ€“90% compared to classical heuristics and achieves substantial speedups on large-scale instances.

Technology Category

Application Category

๐Ÿ“ Abstract
The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant benefit of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, leveraging DRL, we can train the policy network to optimize the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.
Problem

Research questions and friction points this paper is trying to address.

Solves high computational cost in TPP exact methods
Improves limited performance of TPP heuristic designs
Enhances generalization across varying TPP instance sizes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bipartite graph represents market-product relations
Policy network constructs route sequentially
Meta-learning enables generalization across sizes
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Haofeng Yuan
Department of Automation & BNRist, Tsinghua University, Beijing 100084, China
R
Rong-Zhu Zhu
Department of Automation & BNRist, Tsinghua University, Beijing 100084, China
W
Wanlu Yang
Department of Automation & BNRist, Tsinghua University, Beijing 100084, China
Shiji Song
Shiji Song
Tsinghua University
Modeling and optimizationcomplex systemand stochastic systems
K
Keyou You
Department of Automation & BNRist, Tsinghua University, Beijing 100084, China
Yuli Zhang
Yuli Zhang
School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China