🤖 AI Summary
Aerial robots suffer from insufficient control robustness in dynamic and adverse environments, while existing model-based reinforcement learning approaches (e.g., Dreamer) struggle with policy convergence due to poor generalization of learned world models.
Method: We propose a physics-informed model predictive framework: the quadrotor is modeled as a rigid body with embedded 6-DOF dynamical priors; physical-consistent state rollouts are achieved via Runge–Kutta4 numerical integration; and a recursive world model is integrated within the Dreamer architecture to jointly optimize representation and policy in the replay buffer.
Contribution/Results: Our method maintains high sample efficiency while significantly improving cross-trajectory generalization, effectively suppressing state prediction divergence, and enhancing closed-loop control stability and robustness—particularly under environmental disturbances and dynamic constraints.
📝 Abstract
Current control algorithms for aerial robots struggle with robustness in dynamic environments and adverse conditions. Model-based reinforcement learning (RL) has shown strong potential in handling these challenges while remaining sample-efficient. Additionally, Dreamer has demonstrated that online model-based RL can be achieved using a recurrent world model trained on replay buffer data. However, applying Dreamer to aerial systems has been quite challenging due to its sample inefficiency and poor generalization of dynamics models. Our work explores a physics-informed approach to world model learning and improves policy performance. The world model treats the quadcopter as a free-body system and predicts the net forces and moments acting on it, which are then passed through a 6-DOF Runge-Kutta integrator (RK4) to predict future state rollouts. In this paper, we compare this physics-informed method to a standard RNN-based world model. Although both models perform well on the training data, we observed that they fail to generalize to new trajectories, leading to rapid divergence in state rollouts, preventing policy convergence.