Learning global control of underactuated systems with Model-Based Reinforcement Learning

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of achieving global stabilization for underactuated systems—such as the Pendubot and Acrobot—through a model-based reinforcement learning (MBRL) approach. The method extends the MC-PILCO algorithm to two-link underactuated systems for the first time, integrating Gaussian process dynamics modeling, Monte Carlo policy optimization, and Bayesian uncertainty quantification. It achieves high-fidelity model learning and end-to-end global control policy optimization using minimal interaction data—approximately 200 seconds per task—without requiring piecewise controllers or manual energy shaping. The approach demonstrates both efficacy and robustness in simulation and on real hardware platforms. It secured consecutive championship titles in the ICRA 2025 AI Olympics RealAIGym competition, achieving over a tenfold improvement in sample efficiency compared to state-of-the-art model-free methods.

Technology Category

Application Category

📝 Abstract
This short paper describes our proposed solution for the third edition of the"AI Olympics with RealAIGym"competition, held at ICRA 2025. We employed Monte-Carlo Probabilistic Inference for Learning Control (MC-PILCO), an MBRL algorithm recognized for its exceptional data efficiency across various low-dimensional robotic tasks, including cart-pole, ball &plate, and Furuta pendulum systems. MC-PILCO optimizes a system dynamics model using interaction data, enabling policy refinement through simulation rather than direct system data optimization. This approach has proven highly effective in physical systems, offering greater data efficiency than Model-Free (MF) alternatives. Notably, MC-PILCO has previously won the first two editions of this competition, demonstrating its robustness in both simulated and real-world environments. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand: learning a global policy for the pendubot and acrobot systems.
Problem

Research questions and friction points this paper is trying to address.

Learning global control of underactuated robotic systems
Improving data efficiency in model-based reinforcement learning
Optimizing policies for pendubot and acrobot systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MC-PILCO for data-efficient MBRL
Optimizes dynamics model via interaction data
Refines policy through simulation, not direct data
🔎 Similar Papers
No similar papers found.
N
Niccolo' Turcato
Department of Information Engineering, University of Padova, Italy
M
Marco Cali
Department of Information Engineering, University of Padova, Italy
A
A. D. Libera
Department of Information Engineering, University of Padova, Italy
Giulio Giacomuzzo
Giulio Giacomuzzo
PhD student, University of Padova
Learning for controlHuman Robot Interaction
Ruggero Carli
Ruggero Carli
Associate Professor at University of Padova
Control Theory
Diego Romeres
Diego Romeres
Senior Principal Research Scientist & Team Leader at Mitsubishi Electric Research Laboratories
RoboticsMachine LearningBayesian EstimationOptimization