Sample-Efficient Reinforcement Learning of Koopman eNMPC

📅 2025-03-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low sample efficiency of data-driven economic nonlinear model predictive control (eNMPC) in reinforcement learning, this paper proposes an end-to-end model-based RL framework that integrates Koopman operator theory with differentiable eNMPC. Our method enables automatic differentiation of Koopman-based eNMPC for the first time and incorporates physics-informed priors to enhance data efficiency. Crucially, it jointly optimizes the dynamic model and control policy without requiring full system identification. The resulting framework is both interpretable and fully differentiable, significantly reducing training data requirements. In the CSTR benchmark case study, our approach outperforms conventional data-driven eNMPC and purely neural controllers in control performance while reducing training samples by over 40%. These results demonstrate its effective trade-off between sample efficiency and closed-loop optimality.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) can be used to tune data-driven (economic) nonlinear model predictive controllers ((e)NMPCs) for optimal performance in a specific control task by optimizing the dynamic model or parameters in the policy's objective function or constraints, such as state bounds. However, the sample efficiency of RL is crucial, and to improve it, we combine a model-based RL algorithm with our published method that turns Koopman (e)NMPCs into automatically differentiable policies. We apply our approach to an eNMPC case study of a continuous stirred-tank reactor (CSTR) model from the literature. The approach outperforms benchmark methods, i.e., data-driven eNMPCs using models based on system identification without further RL tuning of the resulting policy, and neural network controllers trained with model-based RL, by achieving superior control performance and higher sample efficiency. Furthermore, utilizing partial prior knowledge about the system dynamics via physics-informed learning further increases sample efficiency.
Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in reinforcement learning for control tasks
Optimizing Koopman eNMPC performance using differentiable policies
Enhancing control in continuous stirred-tank reactor systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Koopman eNMPC with differentiable policies
Model-based RL for sample efficiency
Physics-informed learning enhances dynamics
🔎 Similar Papers
No similar papers found.
Daniel Mayfrank
Daniel Mayfrank
Doctoral student, Forschungszentrum Jülich GmbH, Institute of Energy and Climate Research
Optimal controlMachine learning
M
M. Velioglu
Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems, Energy Systems Engineering (ICE-1), Jülich 52425, Germany; RWTH Aachen University, Aachen 52062, Germany
Alexander Mitsos
Alexander Mitsos
AVT Systemverfahrenstechnik, RWTH Aachen University and Energy Systems Engineering IEK-10
process systems engineeringenergy systemsglobal optimizationbilevel optimizationprocess
M
M. Dahmen
Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems, Energy Systems Engineering (ICE-1), Jülich 52425, Germany