Accelerating Model-Based Reinforcement Learning with State-Space World Models

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

To address the prohibitively long training time and poor deployability of world models in model-based reinforcement learning (MBRL) for complex real-world scenarios, this paper proposes a state-space model (SSM)-based acceleration framework for world modeling. It is the first work to integrate SSMs into MBRL, enabling parallelizable dynamics modeling. Additionally, we introduce a privileged information distillation mechanism to enhance modeling accuracy under partial observability. The framework supports end-to-end joint optimization, balancing computational efficiency and generalization capability. Evaluated on a real-world agile quadrotor flight task, our approach accelerates world model training by up to 10× and achieves a 4× speedup in overall MBRL training, while matching state-of-the-art methods in both sample efficiency and task performance.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) is a powerful approach for robot learning. However, model-free RL (MFRL) requires a large number of environment interactions to learn successful control policies. This is due to the noisy RL training updates and the complexity of robotic systems, which typically involve highly non-linear dynamics and noisy sensor signals. In contrast, model-based RL (MBRL) not only trains a policy but simultaneously learns a world model that captures the environment's dynamics and rewards. The world model can either be used for planning, for data collection, or to provide first-order policy gradients for training. Leveraging a world model significantly improves sample efficiency compared to model-free RL. However, training a world model alongside the policy increases the computational complexity, leading to longer training times that are often intractable for complex real-world scenarios. In this work, we propose a new method for accelerating model-based RL using state-space world models. Our approach leverages state-space models (SSMs) to parallelize the training of the dynamics model, which is typically the main computational bottleneck. Additionally, we propose an architecture that provides privileged information to the world model during training, which is particularly relevant for partially observable environments. We evaluate our method in several real-world agile quadrotor flight tasks, involving complex dynamics, for both fully and partially observable environments. We demonstrate a significant speedup, reducing the world model training time by up to 10 times, and the overall MBRL training time by up to 4 times. This benefit comes without compromising performance, as our method achieves similar sample efficiency and task rewards to state-of-the-art MBRL methods.

Problem

Research questions and friction points this paper is trying to address.

Accelerating model-based reinforcement learning

Reducing computational complexity in training

Improving sample efficiency in robotic tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

State-space models parallelize dynamics training

Architecture provides privileged world model information

Significantly reduces MBRL training time

🔎 Similar Papers

PWM: Policy Learning with Multi-Task World Models