Sample-Efficient Reinforcement Learning of Koopman eNMPC

📅 2025-03-24

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the low sample efficiency of data-driven economic nonlinear model predictive control (eNMPC) in reinforcement learning, this paper proposes an end-to-end model-based RL framework that integrates Koopman operator theory with differentiable eNMPC. Our method enables automatic differentiation of Koopman-based eNMPC for the first time and incorporates physics-informed priors to enhance data efficiency. Crucially, it jointly optimizes the dynamic model and control policy without requiring full system identification. The resulting framework is both interpretable and fully differentiable, significantly reducing training data requirements. In the CSTR benchmark case study, our approach outperforms conventional data-driven eNMPC and purely neural controllers in control performance while reducing training samples by over 40%. These results demonstrate its effective trade-off between sample efficiency and closed-loop optimality.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) can be used to tune data-driven (economic) nonlinear model predictive controllers ((e)NMPCs) for optimal performance in a specific control task by optimizing the dynamic model or parameters in the policy's objective function or constraints, such as state bounds. However, the sample efficiency of RL is crucial, and to improve it, we combine a model-based RL algorithm with our published method that turns Koopman (e)NMPCs into automatically differentiable policies. We apply our approach to an eNMPC case study of a continuous stirred-tank reactor (CSTR) model from the literature. The approach outperforms benchmark methods, i.e., data-driven eNMPCs using models based on system identification without further RL tuning of the resulting policy, and neural network controllers trained with model-based RL, by achieving superior control performance and higher sample efficiency. Furthermore, utilizing partial prior knowledge about the system dynamics via physics-informed learning further increases sample efficiency.

Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in reinforcement learning for control tasks

Optimizing Koopman eNMPC performance using differentiable policies

Enhancing control in continuous stirred-tank reactor systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Koopman eNMPC with differentiable policies

Model-based RL for sample efficiency

Physics-informed learning enhances dynamics

🔎 Similar Papers

No similar papers found.