đ€ AI Summary
To address inaccurate terrain perception in off-road navigation caused by limited field-of-view, occlusions, and low-resolution distant observations from onboard vehicle cameras, this paper proposes an aerial-ground collaborative self-supervised terrain representation method. It leverages overhead imagery captured by a hovering UAV and jointly models it with ground-vehicle proprioceptive signalsâsuch as vibration, bumpiness, and energy consumptionâto establish a cross-view aligned multimodal self-supervised learning framework. This work is the first to incorporate UAV aerial imagery into self-supervised terrain perception training. A lightweight prediction network is designed and evaluated on 2.8 km of real-world forest terrain data, achieving a 21.37% improvement in overall terrain attribute prediction accuracy and a 37.35% gain in high-vegetation scenarios. A closed-loop validationâfrom UAV pre-survey and path planning to ground autonomous executionâis demonstrated, significantly enhancing navigation robustness in complex, unstructured environments.
đ Abstract
Terrain awareness is an essential milestone to enable truly autonomous off-road navigation. Accurately predicting terrain characteristics allows optimizing a vehicle's path against potential hazards. Recent methods use deep neural networks to predict traversability-related terrain properties in a self-supervised manner, relying on proprioception as a training signal. However, onboard cameras are inherently limited by their point-of-view relative to the ground, suffering from occlusions and vanishing pixel density with distance. This paper introduces a novel approach for self-supervised terrain characterization using an aerial perspective from a hovering drone. We capture terrain-aligned images while sampling the environment with a ground vehicle, effectively training a simple predictor for vibrations, bumpiness, and energy consumption. Our dataset includes 2.8 km of off-road data collected in forest environment, comprising 13 484 ground-based images and 12 935 aerial images. Our findings show that drone imagery improves terrain property prediction by 21.37 % on the whole dataset and 37.35 % in high vegetation, compared to ground robot images. We conduct ablation studies to identify the main causes of these performance improvements. We also demonstrate the real-world applicability of our approach by scouting an unseen area with a drone, planning and executing an optimized path on the ground.