🤖 AI Summary
To address the low sample efficiency of data-driven economic nonlinear model predictive control (eNMPC) in reinforcement learning, this paper proposes an end-to-end model-based RL framework that integrates Koopman operator theory with differentiable eNMPC. Our method enables automatic differentiation of Koopman-based eNMPC for the first time and incorporates physics-informed priors to enhance data efficiency. Crucially, it jointly optimizes the dynamic model and control policy without requiring full system identification. The resulting framework is both interpretable and fully differentiable, significantly reducing training data requirements. In the CSTR benchmark case study, our approach outperforms conventional data-driven eNMPC and purely neural controllers in control performance while reducing training samples by over 40%. These results demonstrate its effective trade-off between sample efficiency and closed-loop optimality.
📝 Abstract
Reinforcement learning (RL) can be used to tune data-driven (economic) nonlinear model predictive controllers ((e)NMPCs) for optimal performance in a specific control task by optimizing the dynamic model or parameters in the policy's objective function or constraints, such as state bounds. However, the sample efficiency of RL is crucial, and to improve it, we combine a model-based RL algorithm with our published method that turns Koopman (e)NMPCs into automatically differentiable policies. We apply our approach to an eNMPC case study of a continuous stirred-tank reactor (CSTR) model from the literature. The approach outperforms benchmark methods, i.e., data-driven eNMPCs using models based on system identification without further RL tuning of the resulting policy, and neural network controllers trained with model-based RL, by achieving superior control performance and higher sample efficiency. Furthermore, utilizing partial prior knowledge about the system dynamics via physics-informed learning further increases sample efficiency.