🤖 AI Summary
This work addresses the challenge of achieving reliable 3D perception and obstacle avoidance in unstructured off-road environments without relying on costly LiDAR sensors. We propose a lightweight monocular vision-based alternative that, for the first time, effectively integrates zero-shot monocular depth estimation from the foundation model Depth Anything V2 into an off-road navigation system. Metric scale recovery is achieved by fusing sparse SLAM odometry from VINS-Mono, while edge masking and temporal smoothing strategies are introduced to mitigate hallucinated obstacles and SLAM instability. The resulting robot-centric 2.5D elevation map enables robust path planning. Requiring no task-specific training, our system matches the navigation performance of high-resolution LiDAR in both Isaac Sim simulations and real-world field tests. We open-source the complete navigation stack and simulation environment to provide a reproducible benchmark.
📝 Abstract
Off-road autonomous navigation demands reliable 3D perception for robust obstacle detection in challenging unstructured terrain. While LiDAR is accurate, it is costly and power-intensive. Monocular depth estimation using foundation models offers a lightweight alternative, but its integration into outdoor navigation stacks remains underexplored. We present an open-source off-road navigation stack supporting both LiDAR and monocular 3D perception without task-specific training. For the monocular setup, we combine zero-shot depth prediction (Depth Anything V2) with metric depth rescaling using sparse SLAM measurements (VINS-Mono). Two key enhancements improve robustness: edge-masking to reduce obstacle hallucination and temporal smoothing to mitigate the impact of SLAM instability. The resulting point cloud is used to generate a robot-centric 2.5D elevation map for costmap-based planning. Evaluated in photorealistic simulations (Isaac Sim) and real-world unstructured environments, the monocular configuration matches high-resolution LiDAR performance in most scenarios, demonstrating that foundation-model-based monocular depth estimation is a viable LiDAR alternative for robust off-road navigation. By open-sourcing the navigation stack and the simulation environment, we provide a complete pipeline for off-road navigation as well as a reproducible benchmark. Code available at https://github.com/LARIAD/Offroad-Nav.