MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work proposes the MAVEN framework to address the limited generalization of reinforcement learning policies for quadrotors under significant dynamic changes, such as abrupt mass variations or substantial single-motor thrust loss. By integrating meta-reinforcement learning with a novel predictive context encoder, MAVEN enables a single policy to perform end-to-end agile control across diverse dynamics through online inference of system properties from interaction history. The approach demonstrates, for the first time on a real quadrotor, strong zero-shot sim-to-real transfer with high adaptability and maneuverability. Experimental results show stable high-speed flight under extreme conditions—including up to 66.7% mass change or 70% thrust loss in a single motor—with policy training converging in under one hour.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for achieving online agile navigation with quadrotors. Despite this success, policies trained via standard RL typically fail to generalize across significant dynamic variations, exhibiting a critical lack of adaptability. This work introduces MAVEN, a meta-RL framework that enables a single policy to achieve robust end-to-end navigation across a wide range of quadrotor dynamics. Our approach features a novel predictive context encoder, which learns to infer a latent representation of the system dynamics from interaction history. We demonstrate our method in agile waypoint traversal tasks under two challenging scenarios: large variations in quadrotor mass and severe single-rotor thrust loss. We leverage a GPU-vectorized simulator to distribute tasks across thousands of parallel environments, overcoming the long training times of meta-RL to converge in less than an hour. Through extensive experiments in both simulation and the real world, we validate that MAVEN achieves superior adaptation and agility. The policy successfully executes zero-shot sim-to-real transfer, demonstrating robust online adaptation by performing high-speed maneuvers despite mass variations of up to 66.7% and single-rotor thrust losses as severe as 70%.

Problem

Research questions and friction points this paper is trying to address.

quadrotor dynamics

generalization

adaptability

agile navigation

dynamic variations

Innovation

Methods, ideas, or system contributions that make the work stand out.

meta-reinforcement learning

quadrotor agility

predictive context encoder