MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the MAVEN framework to address the limited generalization of reinforcement learning policies for quadrotors under significant dynamic changes, such as abrupt mass variations or substantial single-motor thrust loss. By integrating meta-reinforcement learning with a novel predictive context encoder, MAVEN enables a single policy to perform end-to-end agile control across diverse dynamics through online inference of system properties from interaction history. The approach demonstrates, for the first time on a real quadrotor, strong zero-shot sim-to-real transfer with high adaptability and maneuverability. Experimental results show stable high-speed flight under extreme conditions—including up to 66.7% mass change or 70% thrust loss in a single motor—with policy training converging in under one hour.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has emerged as a powerful paradigm for achieving online agile navigation with quadrotors. Despite this success, policies trained via standard RL typically fail to generalize across significant dynamic variations, exhibiting a critical lack of adaptability. This work introduces MAVEN, a meta-RL framework that enables a single policy to achieve robust end-to-end navigation across a wide range of quadrotor dynamics. Our approach features a novel predictive context encoder, which learns to infer a latent representation of the system dynamics from interaction history. We demonstrate our method in agile waypoint traversal tasks under two challenging scenarios: large variations in quadrotor mass and severe single-rotor thrust loss. We leverage a GPU-vectorized simulator to distribute tasks across thousands of parallel environments, overcoming the long training times of meta-RL to converge in less than an hour. Through extensive experiments in both simulation and the real world, we validate that MAVEN achieves superior adaptation and agility. The policy successfully executes zero-shot sim-to-real transfer, demonstrating robust online adaptation by performing high-speed maneuvers despite mass variations of up to 66.7% and single-rotor thrust losses as severe as 70%.
Problem

Research questions and friction points this paper is trying to address.

quadrotor dynamics
generalization
adaptability
agile navigation
dynamic variations
Innovation

Methods, ideas, or system contributions that make the work stand out.

meta-reinforcement learning
quadrotor agility
predictive context encoder
zero-shot sim-to-real transfer
dynamics adaptation
🔎 Similar Papers
No similar papers found.