Accelerating Model-Based Reinforcement Learning using Non-Linear Trajectory Optimization

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the slow convergence of MC-PILCO in policy optimization, this paper introduces the nonlinear trajectory optimization method iLQR into its framework for the first time, proposing an exploration-enhanced co-optimization mechanism: iLQR generates high-information initial trajectories to initialize the policy, which are jointly optimized with a Gaussian process dynamics model. This approach overcomes the dual bottlenecks of sample efficiency and optimization speed inherent in conventional model-based reinforcement learning, significantly accelerating convergence while maintaining task success rates. In the Cart-Pole benchmark, the method reduces execution time by 45.9%, achieves 100% success across four independent trials, and decreases iteration count without compromising overall solution speed. These results demonstrate a synergistic improvement in both computational efficiency and robustness.

Technology Category

Application Category

📝 Abstract

This paper addresses the slow policy optimization convergence of Monte Carlo Probabilistic Inference for Learning Control (MC-PILCO), a state-of-the-art model-based reinforcement learning (MBRL) algorithm, by integrating it with iterative Linear Quadratic Regulator (iLQR), a fast trajectory optimization method suitable for nonlinear systems. The proposed method, Exploration-Boosted MC-PILCO (EB-MC-PILCO), leverages iLQR to generate informative, exploratory trajectories and initialize the policy, significantly reducing the number of required optimization steps. Experiments on the cart-pole task demonstrate that EB-MC-PILCO accelerates convergence compared to standard MC-PILCO, achieving up to $m{45.9%}$ reduction in execution time when both methods solve the task in four trials. EB-MC-PILCO also maintains a $m{100%}$ success rate across trials while solving the task faster, even in cases where MC-PILCO converges in fewer iterations.

Problem

Research questions and friction points this paper is trying to address.

Improving slow policy optimization in MC-PILCO

Integrating iLQR for faster trajectory optimization

Reducing execution time while maintaining success rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates MC-PILCO with iLQR optimization

Uses iLQR for exploratory trajectory generation

Reduces policy optimization steps significantly

🔎 Similar Papers

No similar papers found.