NavRL: Learning Safe Flight in Dynamic Environments

📅 2024-09-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Safe UAV navigation in dynamic obstacle environments remains challenging due to reliance on restrictive modeling assumptions and manual tuning in conventional hierarchical prediction-and-planning architectures. Method: This paper proposes an end-to-end deep reinforcement learning framework built upon the Proximal Policy Optimization (PPO) algorithm. It integrates a formally verifiable safety shielding mechanism inspired by Velocity Obstacles (VO), employs customized state and action representations, and enables large-scale parallel training in NVIDIA Isaac Sim. The framework supports zero-shot sim-to-real transfer. Contribution/Results: Extensive experiments demonstrate state-of-the-art real-time collision avoidance performance in mixed static-dynamic obstacle scenarios, achieving significantly lower collision rates than baseline methods. The approach ensures provable safety guarantees, strong generalization across unseen environments and obstacle dynamics, and robust deployment capability on physical platforms.

Technology Category

Application Category

📝 Abstract
Safe flight in dynamic environments requires unmanned aerial vehicles (UAVs) to make effective decisions when navigating cluttered spaces with moving obstacles. Traditional approaches often decompose decision-making into hierarchical modules for prediction and planning. Although these handcrafted systems can perform well in specific settings, they might fail if environmental conditions change and often require careful parameter tuning. Additionally, their solutions could be suboptimal due to the use of inaccurate mathematical model assumptions and simplifications aimed at achieving computational efficiency. To overcome these limitations, this paper introduces the NavRL framework, a deep reinforcement learning-based navigation method built on the Proximal Policy Optimization (PPO) algorithm. NavRL utilizes our carefully designed state and action representations, allowing the learned policy to make safe decisions in the presence of both static and dynamic obstacles, with zero-shot transfer from simulation to real-world flight. Furthermore, the proposed method adopts a simple but effective safety shield for the trained policy, inspired by the concept of velocity obstacles, to mitigate potential failures associated with the black-box nature of neural networks. To accelerate the convergence, we implement the training pipeline using NVIDIA Isaac Sim, enabling parallel training with thousands of quadcopters. Simulation and physical experiments show that our method ensures safe navigation in dynamic environments and results in the fewest collisions compared to benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Safe UAV navigation in dynamic environments
Overcoming limitations of traditional hierarchical modules
Zero-shot transfer from simulation to real-world flight
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep reinforcement learning-based navigation
Proximal Policy Optimization algorithm
Safety shield for trained policy
🔎 Similar Papers
No similar papers found.