🤖 AI Summary
Existing high-speed flight methods often rely on simplified point-mass models, yielding dynamically infeasible trajectories that are difficult to track. This work proposes an end-to-end reinforcement learning framework that directly maps depth images to body-rate commands without requiring expert demonstrations, explicit mapping, backbone networks, or a separate controller. Leveraging a high-fidelity, differentiable simulator calibrated through parameter identification, the approach achieves zero-shot transfer to real-world, complex outdoor environments and supports full flight-envelope control. It attains state-of-the-art performance across multiple benchmarks, demonstrating the highest success rate and lowest jerk. Notably, the system enables stable flight at speeds up to 7.5 m/s in previously unseen, ultra-dense forest environments.
📝 Abstract
Obstacle avoidance is a fundamental vision-based task essential for enabling quadrotors to perform advanced applications. When planning the trajectory, existing approaches both on optimization and learning typically regard quadrotor as a point-mass model, giving path or velocity commands then tracking the commands by outer-loop controller. However, at high speeds, planned trajectories sometimes become dynamically infeasible in actual flight, which beyond the capacity of controller. In this paper, we propose a novel end-to-end policy that directly maps depth images to low-level bodyrate commands by reinforcement learning via differentiable simulation. The high-fidelity simulation in training after parameter identification significantly reduces all the gaps between training, simulation and real world. Analytical process by differentiable simulation provides accurate gradient to ensure efficiently training the low-level policy without expert guidance. The policy employs a lightweight and the most simple inference pipeline that runs without explicit mapping, backbone networks, primitives, recurrent structures, or backend controllers, nor curriculum or privileged guidance. By inferring low-level command directly to the hardware controller, the method enables full flight envelope control and avoids the dynamic-infeasible issue.Experimental results demonstrate that the proposed approach achieves the highest success rate and the lowest jerk among state-of-the-art baselines across multiple benchmarks. The policy also exhibits strong generalization, successfully deploying zero-shot in unseen, outdoor environments while reaching speeds of up to 7.5m/s as well as stably flying in the super-dense forest.