🤖 AI Summary
This work addresses the challenge of safely deploying deep reinforcement learning (DRL) in industrial batch process control, where action stochasticity and the absence of stability guarantees hinder reliable operation. To overcome this, the authors propose a novel two-layer architecture that synergistically integrates iterative learning control (ILC) with DRL. At the inter-batch level, ILC ensures closed-loop stability and compensates for repetitive disturbances, while at the intra-batch level, DRL adapts to non-repetitive disturbances. State estimation is performed via a Kalman filter to inform the learning agent, guiding it toward policies that satisfy operational constraints and maintain system stability. This approach represents the first integration of ILC’s inherent stability mechanisms into a DRL framework, achieving intelligent control that simultaneously ensures stability, safety, and adaptability under diverse disturbance conditions.
📝 Abstract
A significant limitation of Deep Reinforcement Learning (DRL) is the stochastic uncertainty in actions generated during exploration-exploitation, which poses substantial safety risks during both training and deployment. In industrial process control, the lack of formal stability and convergence guarantees further inhibits adoption of DRL methods by practitioners. Conversely, Iterative Learning Control (ILC) represents a well-established autonomous control methodology for repetitive systems, particularly in batch process optimization. ILC achieves desired control performance through iterative refinement of control laws, either between consecutive batches or within individual batches, to compensate for both repetitive and non-repetitive disturbances. This study introduces an Iterative Learning Control-Informed Reinforcement Learning (IL-CIRL) framework for training DRL controllers in dual-layer batch-to-batch and within-batch control architectures for batch processes. The proposed method incorporates Kalman filter-based state estimation within the iterative learning structure to guide DRL agents toward control policies that satisfy operational constraints and ensure stability guarantees. This approach enables the systematic design of DRL controllers for batch processes operating under multiple disturbance conditions.