π€ AI Summary
Existing evaluations of deep reinforcement learning (DRL) for combinatorial optimization (CO) suffer from a lack of standardized benchmarks, leading to incomparable results, poor reproducibility, and high entry barriers. To address this, we introduce the first standardized DRL benchmark framework specifically designed for COβcomprising 27 distinct CO environments and 23 state-of-the-art baseline algorithms. The framework supports modular configuration of environments, policy architectures, and RL algorithms, and is implemented using PyTorch, PyTorch Lightning, and Hydra. It integrates mainstream techniques including Monte Carlo policy gradients (MC-PG), actor-critic methods, pointer networks, and graph neural networks (GNNs). Its core innovation lies in the strict decoupling of research experimentation from engineering implementation, substantially improving reproducibility and development efficiency. Since open-sourcing, it has become the de facto community standard, widely adopted and actively extended by researchers and practitioners worldwide.
π Abstract
Combinatorial optimization (CO) is fundamental to several real-world applications, from logistics and scheduling to hardware design and resource allocation. Deep reinforcement learning (RL) has recently shown significant benefits in solving CO problems, reducing reliance on domain expertise and improving computational efficiency. However, the absence of a unified benchmarking framework leads to inconsistent evaluations, limits reproducibility, and increases engineering overhead, raising barriers to adoption for new researchers. To address these challenges, we introduce RL4CO, a unified and extensive benchmark with in-depth library coverage of 27 CO problem environments and 23 state-of-the-art baselines. Built on efficient software libraries and best practices in implementation, RL4CO features modularized implementation and flexible configurations of diverse environments, policy architectures, RL algorithms, and utilities with extensive documentation. RL4CO helps researchers build on existing successes while exploring and developing their own designs, facilitating the entire research process by decoupling science from heavy engineering. We finally provide extensive benchmark studies to inspire new insights and future work. RL4CO has already attracted numerous researchers in the community and is open-sourced at https://github.com/ai4co/rl4co.