🤖 AI Summary
To address the challenge of zero-shot autonomous navigation for unmanned ground vehicles (UGVs) in unknown environments under GPS-denied conditions, this paper proposes a novel “aerial-observation-to-ground-execution” paradigm. It leverages UAV-captured aerial imagery to reconstruct a neural radiance field (NeRF) 3D scene and generate a high-fidelity teach map. A virtual trajectory is then planned within this NeRF map, and sim-to-real point-cloud registration and closed-loop tracking are achieved by integrating NeRF-derived point-cloud submaps into a LiDAR Teach-and-Repeat (LT&R) framework. This work marks the first direct use of NeRF reconstruction as the navigation map for LT&R, eliminating the need for manual on-site demonstration. In real-world experiments over a 12-km route, the method achieves path-tracking RMSEs of 19.5 cm and 18.4 cm—both below one tire width—with a maximum error of ≤47.6 cm, matching the performance of conventional, manually taught LT&R systems.
📝 Abstract
This paper presents Virtual Teach and Repeat (VirT&R): an extension of the Teach and Repeat (T&R) framework that enables GPS-denied, zero-shot autonomous ground vehicle navigation in untraversed environments. VirT&R leverages aerial imagery captured for a target environment to train a Neural Radiance Field (NeRF) model so that dense point clouds and photo-textured meshes can be extracted. The NeRF mesh is used to create a high-fidelity simulation of the environment for piloting an unmanned ground vehicle (UGV) to virtually define a desired path. The mission can then be executed in the actual target environment by using NeRF-derived point cloud submaps associated along the path and an existing LiDAR Teach and Repeat (LT&R) framework. We benchmark the repeatability of VirT&R on over 12 km of autonomous driving data using physical markings that allow a sim-to-real lateral path-tracking error to be obtained and compared with LT&R. VirT&R achieved measured root mean squared errors (RMSE) of 19.5 cm and 18.4 cm in two different environments, which are slightly less than one tire width (24 cm) on the robot used for testing, and respective maximum errors were 39.4 cm and 47.6 cm. This was done using only the NeRF-derived teach map, demonstrating that VirT&R has similar closed-loop path-tracking performance to LT&R but does not require a human to manually teach the path to the UGV in the actual environment.