What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

📅 2024-12-16

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

career value

166K/year

🤖 AI Summary

To address the instability of reinforcement learning policies during zero-shot sim-to-real transfer of quadrotor control—caused by simulation-to-reality discrepancies—this paper introduces SimpleFlight, the first integrated PPO training framework specifically designed for zero-shot deployment on real hardware. The framework systematically identifies and jointly optimizes five critical factors: domain randomization, observation normalization, reward shaping, action smoothing, and dynamics-aware state encoding. Evaluated on the Crazyflie nano-quadrotor platform, SimpleFlight achieves stable trajectory tracking without any fine-tuning, reducing tracking error on jagged trajectories by over 50%. To foster reproducibility and community advancement, we open-source the complete implementation, pre-trained models, and Omnidrones—a high-fidelity, GPU-accelerated quadrotor simulator optimized for rapid RL training.

Technology Category

Application Category

📝 Abstract

Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibility. Recently, RL-based policy has emerged as a promising alternative due to its ability to directly map observations to actions, reducing the need for detailed system knowledge and actuation constraints. However, a significant challenge remains in bridging the sim-to-real gap, where RL-based policies often experience instability when deployed in real world. In this paper, we investigate key factors for learning robust RL-based control policies that are capable of zero-shot deployment in real-world quadrotors. We identify five critical factors and we develop a PPO-based training framework named SimpleFlight, which integrates these five techniques. We validate the efficacy of SimpleFlight on Crazyflie quadrotor, demonstrating that it achieves more than a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines. The policy derived by SimpleFlight consistently excels across both smooth polynominal trajectories and challenging infeasible zigzag trajectories on small thrust-to-weight quadrotors. In contrast, baseline methods struggle with high-speed or infeasible trajectories. To support further research and reproducibility, we integrate SimpleFlight into a GPU-based simulator Omnidrones and provide open-source access to the code and model checkpoints. We hope SimpleFlight will offer valuable insights for advancing RL-based quadrotor control. For more details, visit our project website at https://sites.google.com/view/simpleflight/.

Problem

Research questions and friction points this paper is trying to address.

Bridging sim-to-real gap for quadrotor RL policies

Reducing trajectory tracking error in real-world deployment

Handling high-speed and infeasible trajectories effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

PPO-based training framework SimpleFlight

Integrates five critical sim-to-real factors

GPU-based simulator Omnidrones integration

🔎 Similar Papers

The Power of Input: Benchmarking Zero-Shot Sim-to-Real Transfer of Reinforcement Learning Control Policies for Quadrotor Control