An efficient deep reinforcement learning environment for flexible job-shop scheduling

📅 2025-09-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) environments for the flexible job shop scheduling problem (FJSP) are often oversimplified or neglected, hindering policy generalization and performance. Method: This paper constructs a temporally grounded DRL environment based on discrete-event simulation and proposes an end-to-end Proximal Policy Optimization (PPO) scheduling framework. It introduces a lightweight state representation and a machine-scheduling-zone-based interpretable reward function. Contribution/Results: By jointly optimizing state compression and environment-driven reward shaping, the proposed method significantly outperforms classical priority rules, OR-Tools, state-of-the-art heuristics, and existing DRL approaches across multiple public benchmark instances. The results empirically validate that fine-grained, simulation-informed environment modeling is critical to improving both the efficiency and accuracy of scheduling decisions in FJSP.

Technology Category

Application Category

📝 Abstract
The Flexible Job-shop Scheduling Problem (FJSP) is a classical combinatorial optimization problem that has a wide-range of applications in the real world. In order to generate fast and accurate scheduling solutions for FJSP, various deep reinforcement learning (DRL) scheduling methods have been developed. However, these methods are mainly focused on the design of DRL scheduling Agent, overlooking the modeling of DRL environment. This paper presents a simple chronological DRL environment for FJSP based on discrete event simulation and an end-to-end DRL scheduling model is proposed based on the proximal policy optimization (PPO). Furthermore, a short novel state representation of FJSP is proposed based on two state variables in the scheduling environment and a novel comprehensible reward function is designed based on the scheduling area of machines. Experimental results on public benchmark instances show that the performance of simple priority dispatching rules (PDR) is improved in our scheduling environment and our DRL scheduling model obtains competing performance compared with OR-Tools, meta-heuristic, DRL and PDR scheduling methods.
Problem

Research questions and friction points this paper is trying to address.

Developing efficient deep reinforcement learning for flexible job-shop scheduling
Addressing overlooked environment modeling in DRL scheduling methods
Proposing novel state representation and reward function for FJSP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chronological DRL environment using discrete event simulation
Proximal policy optimization (PPO) based end-to-end model
Novel state representation and comprehensible machine-area reward
🔎 Similar Papers
No similar papers found.
X
Xinquan Wu
College of Computer Science and Technology, Nanjing University of Aeronautics and astronautics, Nanjing, China 210016
Xuefeng Yan
Xuefeng Yan
Molecular Imaging Branch/National Institute of Mental Health/National Institutes of Health
Molecular imaging
Mingqiang Wei
Mingqiang Wei
Professor at Nanjing University of Aeronautics and Astronautics
3D VisionMultimodal FusionComputer GraphicsDeep Geometry LearningCAD
D
Donghai Guan
College of Computer Science and Technology, Nanjing University of Aeronautics and astronautics, Nanjing, China 210016