An efficient deep reinforcement learning environment for flexible job-shop scheduling

📅 2025-09-06

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Deep reinforcement learning (DRL) environments for the flexible job shop scheduling problem (FJSP) are often oversimplified or neglected, hindering policy generalization and performance. Method: This paper constructs a temporally grounded DRL environment based on discrete-event simulation and proposes an end-to-end Proximal Policy Optimization (PPO) scheduling framework. It introduces a lightweight state representation and a machine-scheduling-zone-based interpretable reward function. Contribution/Results: By jointly optimizing state compression and environment-driven reward shaping, the proposed method significantly outperforms classical priority rules, OR-Tools, state-of-the-art heuristics, and existing DRL approaches across multiple public benchmark instances. The results empirically validate that fine-grained, simulation-informed environment modeling is critical to improving both the efficiency and accuracy of scheduling decisions in FJSP.

Technology Category

Application Category

📝 Abstract

The Flexible Job-shop Scheduling Problem (FJSP) is a classical combinatorial optimization problem that has a wide-range of applications in the real world. In order to generate fast and accurate scheduling solutions for FJSP, various deep reinforcement learning (DRL) scheduling methods have been developed. However, these methods are mainly focused on the design of DRL scheduling Agent, overlooking the modeling of DRL environment. This paper presents a simple chronological DRL environment for FJSP based on discrete event simulation and an end-to-end DRL scheduling model is proposed based on the proximal policy optimization (PPO). Furthermore, a short novel state representation of FJSP is proposed based on two state variables in the scheduling environment and a novel comprehensible reward function is designed based on the scheduling area of machines. Experimental results on public benchmark instances show that the performance of simple priority dispatching rules (PDR) is improved in our scheduling environment and our DRL scheduling model obtains competing performance compared with OR-Tools, meta-heuristic, DRL and PDR scheduling methods.

Problem

Research questions and friction points this paper is trying to address.

Developing efficient deep reinforcement learning for flexible job-shop scheduling

Addressing overlooked environment modeling in DRL scheduling methods

Proposing novel state representation and reward function for FJSP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chronological DRL environment using discrete event simulation

Proximal policy optimization (PPO) based end-to-end model

Novel state representation and comprehensible machine-area reward

🔎 Similar Papers

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations