🤖 AI Summary
This work addresses the low sample efficiency of reinforcement learning and the compounding errors in model prediction for nonlinear robotic systems by proposing a Koopman operator–based linear lifted dynamic model embedded within an Actor-Critic architecture. The approach enables efficient on-policy optimization through single-step prediction–based policy gradient estimation. By innovatively integrating Koopman-based linearization with mini-batch policy gradients, the method significantly improves sample efficiency and reduces computational overhead while maintaining competitive control performance. Experimental results demonstrate that the proposed algorithm achieves superior sample efficiency compared to model-free reinforcement learning methods and matches the control performance of classical model-based strategies that rely on accurate dynamics, across multiple simulation benchmarks as well as real-world platforms including the Kinova Gen3 manipulator and the Unitree Go1 quadruped robot.
📝 Abstract
This paper presents a model-based reinforcement learning (RL) framework for optimal closed-loop control of nonlinear robotic systems. The proposed approach learns linear lifted dynamics through Koopman operator theory and integrates the resulting model into an actor-critic architecture for policy optimization, where the policy represents a parameterized closed-loop controller. To reduce computational cost and mitigate model rollout errors, policy gradients are estimated using one-step predictions of the learned dynamics rather than multi-step propagation. This leads to an online mini-batch policy gradient framework that enables policy improvement from streamed interaction data. The proposed framework is evaluated on several simulated nonlinear control benchmarks and two real-world hardware platforms, including a Kinova Gen3 robotic arm and a Unitree Go1 quadruped. Experimental results demonstrate improved sample efficiency over model-free RL baselines, superior control performance relative to model-based RL baselines, and control performance comparable to classical model-based methods that rely on exact system dynamics.