🤖 AI Summary
This work proposes NavGSim, a navigation simulator based on hierarchical 3D Gaussian splatting, to address the lack of high-fidelity, large-scale simulation environments for robotic navigation tasks. For the first time, 3D Gaussian splatting is extended to floor-scale scenes spanning hundreds of square meters, enabling photorealistic rendering and physical interaction. The method introduces a novel Gaussian slicing technique to directly extract navigable regions and collision information from reconstructed geometry. NavGSim further integrates multi-GPU parallel rendering and a complete API, supporting customizable scenes and end-to-end training of Vision-Language-Action (VLA) policies. Experiments demonstrate that models trained with NavGSim significantly improve their ability to understand and execute diverse navigation instructions in both simulated and real-world environments.
📝 Abstract
Simulating realistic environments for robots is widely recognized as a critical challenge in robot learning, particularly in terms of rendering and physical simulation. This challenge becomes even more pronounced in navigation tasks, where trajectories often extend across multiple rooms or entire floors. In this work, we present NavGSim, a Gaussian Splatting-based simulator designed to generate high-fidelity, large-scale navigation environments. Built upon a hierarchical 3D Gaussian Splatting framework, NavGSim enables photorealistic rendering in expansive scenes spanning hundreds of square meters. To simulate navigation collisions, we introduce a Gaussian Splatting-based slice technique that directly extracts navigable areas from reconstructed Gaussians. Additionally, for ease of use, we provide comprehensive NavGSim APIs supporting multi-GPU development, including tools for custom scene reconstruction, robot configuration, policy training, and evaluation. To evaluate NavGSim's effectiveness, we train a Vision-Language-Action (VLA) model using trajectories collected from NavGSim and assess its performance in both simulated and real-world environments. Our results demonstrate that NavGSim significantly enhances the VLA model's scene understanding, enabling the policy to handle diverse navigation queries effectively.