What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

📅 2024-12-16
🏛️ arXiv.org
📈 Citations: 4
Influential: 1
📄 PDF
🤖 AI Summary
To address the instability of reinforcement learning policies during zero-shot sim-to-real transfer of quadrotor control—caused by simulation-to-reality discrepancies—this paper introduces SimpleFlight, the first integrated PPO training framework specifically designed for zero-shot deployment on real hardware. The framework systematically identifies and jointly optimizes five critical factors: domain randomization, observation normalization, reward shaping, action smoothing, and dynamics-aware state encoding. Evaluated on the Crazyflie nano-quadrotor platform, SimpleFlight achieves stable trajectory tracking without any fine-tuning, reducing tracking error on jagged trajectories by over 50%. To foster reproducibility and community advancement, we open-source the complete implementation, pre-trained models, and Omnidrones—a high-fidelity, GPU-accelerated quadrotor simulator optimized for rapid RL training.

Technology Category

Application Category

📝 Abstract
Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibility. Recently, RL-based policy has emerged as a promising alternative due to its ability to directly map observations to actions, reducing the need for detailed system knowledge and actuation constraints. However, a significant challenge remains in bridging the sim-to-real gap, where RL-based policies often experience instability when deployed in real world. In this paper, we investigate key factors for learning robust RL-based control policies that are capable of zero-shot deployment in real-world quadrotors. We identify five critical factors and we develop a PPO-based training framework named SimpleFlight, which integrates these five techniques. We validate the efficacy of SimpleFlight on Crazyflie quadrotor, demonstrating that it achieves more than a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines. The policy derived by SimpleFlight consistently excels across both smooth polynominal trajectories and challenging infeasible zigzag trajectories on small thrust-to-weight quadrotors. In contrast, baseline methods struggle with high-speed or infeasible trajectories. To support further research and reproducibility, we integrate SimpleFlight into a GPU-based simulator Omnidrones and provide open-source access to the code and model checkpoints. We hope SimpleFlight will offer valuable insights for advancing RL-based quadrotor control. For more details, visit our project website at https://sites.google.com/view/simpleflight/.
Problem

Research questions and friction points this paper is trying to address.

Bridging sim-to-real gap for quadrotor RL policies
Reducing trajectory tracking error in real-world deployment
Handling high-speed and infeasible trajectories effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

PPO-based training framework SimpleFlight
Integrates five critical sim-to-real factors
GPU-based simulator Omnidrones integration
🔎 Similar Papers
2024-10-10IEEE/RJS International Conference on Intelligent RObots and SystemsCitations: 1
J
Jiayu Chen
Tsinghua University, Beijing, 100084, China
C
Chaoyang Yu
Tsinghua University, Beijing, 100084, China
Y
Yuqing Xie
Tsinghua University, Beijing, 100084, China
F
Feng Gao
Tsinghua University, Beijing, 100084, China
Y
Yinuo Chen
Tsinghua University, Beijing, 100084, China
S
Shu'ang Yu
Tsinghua University, Beijing, 100084, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China
W
Wenhao Tang
Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, China
Shilong Ji
Shilong Ji
Tsinghua University
roboticsRL
M
Mo Mu
Tsinghua University, Beijing, 100084, China
Y
Yi Wu
Tsinghua University, Beijing, 100084, China
Huazhong Yang
Huazhong Yang
Professor of Electronics Engineering, Tsinghua University
VLSI circuits and systemsmachine intelligencewireless sensor networksbeyond-CMOS computing
Y
Yu Wang
Tsinghua University, Beijing, 100084, China