Accelerating Model-Based Reinforcement Learning using Non-Linear Trajectory Optimization

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the slow convergence of MC-PILCO in policy optimization, this paper introduces the nonlinear trajectory optimization method iLQR into its framework for the first time, proposing an exploration-enhanced co-optimization mechanism: iLQR generates high-information initial trajectories to initialize the policy, which are jointly optimized with a Gaussian process dynamics model. This approach overcomes the dual bottlenecks of sample efficiency and optimization speed inherent in conventional model-based reinforcement learning, significantly accelerating convergence while maintaining task success rates. In the Cart-Pole benchmark, the method reduces execution time by 45.9%, achieves 100% success across four independent trials, and decreases iteration count without compromising overall solution speed. These results demonstrate a synergistic improvement in both computational efficiency and robustness.

Technology Category

Application Category

📝 Abstract
This paper addresses the slow policy optimization convergence of Monte Carlo Probabilistic Inference for Learning Control (MC-PILCO), a state-of-the-art model-based reinforcement learning (MBRL) algorithm, by integrating it with iterative Linear Quadratic Regulator (iLQR), a fast trajectory optimization method suitable for nonlinear systems. The proposed method, Exploration-Boosted MC-PILCO (EB-MC-PILCO), leverages iLQR to generate informative, exploratory trajectories and initialize the policy, significantly reducing the number of required optimization steps. Experiments on the cart-pole task demonstrate that EB-MC-PILCO accelerates convergence compared to standard MC-PILCO, achieving up to $m{45.9%}$ reduction in execution time when both methods solve the task in four trials. EB-MC-PILCO also maintains a $m{100%}$ success rate across trials while solving the task faster, even in cases where MC-PILCO converges in fewer iterations.
Problem

Research questions and friction points this paper is trying to address.

Improving slow policy optimization in MC-PILCO
Integrating iLQR for faster trajectory optimization
Reducing execution time while maintaining success rate
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates MC-PILCO with iLQR optimization
Uses iLQR for exploratory trajectory generation
Reduces policy optimization steps significantly
🔎 Similar Papers
No similar papers found.
M
Marco Cali
Department of Information Engineering of University of Padua, Via Gradenigo 6, Padua, Italy
Giulio Giacomuzzo
Giulio Giacomuzzo
PhD student, University of Padova
Learning for controlHuman Robot Interaction
Ruggero Carli
Ruggero Carli
Associate Professor at University of Padova
Control Theory
A
A. D. Libera
Department of Information Engineering of University of Padua, Via Gradenigo 6, Padua, Italy