🤖 AI Summary
This work addresses key challenges in sim-to-real transfer of end-to-end control policies for quadrotor UAVs—namely, inefficient visual rendering, inaccurate physical modeling, unmodeled sensor discrepancies, and the absence of a unified training and deployment platform—by proposing an integrated framework. The framework systematically combines differentiable physics simulation, reinforcement learning, and a structured reward mechanism for the first time, and introduces a multidimensional sim-to-real alignment strategy encompassing system identification, domain randomization, latency compensation, and noise modeling. Through two-stage validation and hardware-in-the-loop testing, the approach achieves zero-shot real-world deployment across six distinct control tasks, substantially enhancing the effectiveness and robustness of policy transfer.
📝 Abstract
Training and transferring learning-based policies for quadrotors from simulation to reality remains challenging due to inefficient visual rendering, physical modeling inaccuracies, unmodeled sensor discrepancies, and the absence of a unified platform integrating differentiable physics learning into end-to-end training. While recent work has demonstrated various end-to-end quadrotor control tasks, few systems provide a systematic, zero-shot transfer pipeline, hindering reproducibility and real-world deployment. To bridge this gap, we introduce E2E-Fly, an integrated framework featuring an agile quadrotor platform coupled with a full-stack training, validation, and deployment workflow. The training framework incorporates a high-performance simulator with support for differentiable physics learning and reinforcement learning, alongside structured reward design tailored to common quadrotor tasks. We further introduce a two-stage validation strategy using sim-to-sim transfer and hardware-in-the-loop testing, and deploy policies onto two physical quadrotor platforms via a dedicated low-level control interface and a comprehensive sim-to-real alignment methodology, encompassing system identification, domain randomization, latency compensation, and noise modeling. To the best of our knowledge, this is the first work to systematically unify differentiable physical learning with training, validation, and real-world deployment for quadrotors. Finally, we demonstrate the effectiveness of our framework for training six end-to-end control tasks and deploy them in the real world.