🤖 AI Summary
To address the instability of reinforcement learning policies during zero-shot sim-to-real transfer of quadrotor control—caused by simulation-to-reality discrepancies—this paper introduces SimpleFlight, the first integrated PPO training framework specifically designed for zero-shot deployment on real hardware. The framework systematically identifies and jointly optimizes five critical factors: domain randomization, observation normalization, reward shaping, action smoothing, and dynamics-aware state encoding. Evaluated on the Crazyflie nano-quadrotor platform, SimpleFlight achieves stable trajectory tracking without any fine-tuning, reducing tracking error on jagged trajectories by over 50%. To foster reproducibility and community advancement, we open-source the complete implementation, pre-trained models, and Omnidrones—a high-fidelity, GPU-accelerated quadrotor simulator optimized for rapid RL training.
📝 Abstract
Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibility. Recently, RL-based policy has emerged as a promising alternative due to its ability to directly map observations to actions, reducing the need for detailed system knowledge and actuation constraints. However, a significant challenge remains in bridging the sim-to-real gap, where RL-based policies often experience instability when deployed in real world. In this paper, we investigate key factors for learning robust RL-based control policies that are capable of zero-shot deployment in real-world quadrotors. We identify five critical factors and we develop a PPO-based training framework named SimpleFlight, which integrates these five techniques. We validate the efficacy of SimpleFlight on Crazyflie quadrotor, demonstrating that it achieves more than a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines. The policy derived by SimpleFlight consistently excels across both smooth polynominal trajectories and challenging infeasible zigzag trajectories on small thrust-to-weight quadrotors. In contrast, baseline methods struggle with high-speed or infeasible trajectories. To support further research and reproducibility, we integrate SimpleFlight into a GPU-based simulator Omnidrones and provide open-source access to the code and model checkpoints. We hope SimpleFlight will offer valuable insights for advancing RL-based quadrotor control. For more details, visit our project website at https://sites.google.com/view/simpleflight/.