Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
This work addresses the challenges of deep reinforcement learning in combinatorial optimization problems such as job shop scheduling—namely, its lack of interpretability, high computational cost, and difficulty in incorporating human prior knowledge—by proposing the ProRL framework. ProRL introduces procedural policies into scheduling tasks for the first time, representing policies as structured, human-readable, and editable programs via a domain-specific language, DSL-S. It combines local search with Bayesian optimization to efficiently learn high-performing policies. The framework naturally integrates existing industrial heuristic rules and outperforms both conventional heuristics and deep reinforcement learning baselines on standard benchmarks. Notably, ProRL maintains strong performance even with only 100 training episodes or under resource-constrained conditions, achieving a unified balance between high interpretability and computational efficiency.
📝 Abstract
Deep reinforcement learning (DRL) has recently emerged as a promising approach to solve combinatorial optimization problems such as job shop scheduling. However, the policies learned by DRL are typically represented by deep neural networks (DNNs), whose opaque neural architectures and non-interpretable policy decisions can lead to critical trust and usability concerns for human decision makers. In addition, the computational requirements of DNNs can further hinder practical deployment in resource constrained environments. In this work, we propose ProRL, a novel interpretable programmatic reinforcement learning framework that achieves high-performance scheduling with human-readable and editable programmatic policies (i.e., programs). We first introduce a domain-specific language for scheduling (DSL-S) to represent scheduling strategies as structured programs. ProRL then explores the program space defined by DSL-S using local search to identify incomplete programs, which are subsequently completed by learning their parameters via Bayesian optimization. ProRL learns which scheduling heuristic rules to select, and hence, it naturally incorporates existing heuristics already used in industrial scenarios. Experiments on widely used benchmark instances demonstrate the strong performance of ProRL against existing heuristics and DRL baselines. Furthermore, ProRL performs well under strongly constrained computational resources, such as training with only 100 episodes. Our code is available at https://github.com/HcPlu/ProRL.
Problem

Research questions and friction points this paper is trying to address.

interpretable reinforcement learning
job shop scheduling
programmatic policies
combinatorial optimization
deep reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

interpretable reinforcement learning
programmatic policies
domain-specific language
Bayesian optimization
job shop scheduling
🔎 Similar Papers
2024-02-11arXiv.orgCitations: 3