🤖 AI Summary
This work addresses the poor generalization of local navigation policies for mobile robots—particularly humanoid robots—when transferring from simulation to real-world environments. Methodologically, we propose an end-to-end reinforcement learning framework trained in NVIDIA Isaac Sim using PPO and SAC algorithms. Our approach introduces normalized exteroceptive state representations enabling robust LiDAR–IMU sensor fusion, designs a cross-platform (Isaac Sim → Gazebo) transferable architecture, and extends the state and action spaces to accommodate complex navigation tasks. Contributions and results include: (i) the first demonstration of zero-shot sim-to-real transfer to a ROS 2 physical robot, achieving obstacle avoidance performance on par with Nav2; (ii) strong generalization validated across diverse simulated robot morphologies; and (iii) open-sourcing of the complete codebase—including training, simulation, and real-robot deployment pipelines.
📝 Abstract
Unprecedented agility and dexterous manipulation have been demonstrated with controllers based on deep reinforcement learning (RL), with a significant impact on legged and humanoid robots. Modern tooling and simulation platforms, such as NVIDIA Isaac Sim, have been enabling such advances. This article focuses on demonstrating the applications of Isaac in local planning and obstacle avoidance as one of the most fundamental ways in which a mobile robot interacts with its environments. Although there is extensive research on proprioception-based RL policies, the article highlights less standardized and reproducible approaches to exteroception. At the same time, the article aims to provide a base framework for end-to-end local navigation policies and how a custom robot can be trained in such simulation environment. We benchmark end-to-end policies with the state-of-the-art Nav2, navigation stack in Robot Operating System (ROS). We also cover the sim-to-real transfer process by demonstrating zero-shot transferability of policies trained in the Isaac simulator to real-world robots. This is further evidenced by the tests with different simulated robots, which show the generalization of the learned policy. Finally, the benchmarks demonstrate comparable performance to Nav2, opening the door to quick deployment of state-of-the-art end-to-end local planners for custom robot platforms, but importantly furthering the possibilities by expanding the state and action spaces or task definitions for more complex missions. Overall, with this article we introduce the most important steps, and aspects to consider, in deploying RL policies for local path planning and obstacle avoidance with Isaac Sim training, Gazebo testing, and ROS 2 for real-time inference in real robots. The code is available at https://github.com/sahars93/RL-Navigation.