🤖 AI Summary
This study investigates the robustness of deep reinforcement learning (DRL)-based quadrotor controllers in sim-to-real zero-shot transfer, focusing on how observation input modalities affect transfer performance. We systematically train policies using PPO and SAC in Gazebo and PyBullet across diverse input configurations—including full state, position+velocity only, and pure visual features—and evaluate them via zero-shot deployment on a physical quadrotor platform. This constitutes the first large-scale benchmark analysis of DRL policy input spaces for real-world deployment. Results reveal that redundant state inputs severely degrade transfer robustness: policies trained on full state fail in over 60% of trials, whereas lightweight position+velocity-only observations achieve >92% task success. We propose a lightweight observation design principle tailored for real-world deployment and uncover a fundamental trade-off between input sparsity and generalization capability—providing critical practical guidance for deploying DRL in physical systems.
📝 Abstract
In the last decade, data-driven approaches have become popular choices for quadrotor control, thanks to their ability to facilitate the adaptation to unknown or uncertain flight conditions. Among the different data-driven paradigms, Deep Reinforcement Learning (DRL) is currently one of the most explored. However, the design of DRL agents for Micro Aerial Vehicles (MAVs) remains an open challenge. While some works have studied the output configuration of these agents (i.e., what kind of control to compute), there is no general consensus on the type of input data these approaches should employ. Multiple works simply provide the DRL agent with full state information, without questioning if this might be redundant and unnecessarily complicate the learning process, or pose superfluous constraints on the availability of such information in real platforms. In this work, we provide an in-depth benchmark analysis of different configurations of the observation space. We optimize multiple DRL agents in simulated environments with different input choices and study their robustness and their sim-to-real transfer capabilities with zero-shot adaptation. We believe that the outcomes and discussions presented in this work supported by extensive experimental results could be an important milestone in guiding future research on the development of DRL agents for aerial robot tasks.