π€ AI Summary
Visual navigation policies suffer from limited generalization due to the scarcity of high-quality annotated data. To address this, we propose Model-Driven Re-Annotation (MBRA), a novel framework that leverages large-scale unlabeled YouTube videos and crowdsourced teleoperation data. MBRA employs a short-horizon expert model to generate high-confidence action labels, which are then distilled into LogoNavβa long-horizon navigation policy. LogoNav integrates vision-action joint representation learning, GPS- and vision-conditioned goal modeling, and model-predictive control. Evaluated across six cities on three continents, LogoNav achieves robust indoor-outdoor navigation over 300 meters and enables stable quadruped locomotion in dense pedestrian environments, setting new state-of-the-art performance. This work introduces the MBRA paradigm for the first time, substantially reducing annotation cost while significantly enhancing scalability and generalization of visual navigation policies.
π Abstract
Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. To overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled YouTube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments. Our extensive real-world evaluations, conducted across a fleet of robots (including quadrupeds) in six cities on three continents, validate the policy's ability to generalize and navigate effectively even amidst pedestrians in crowded settings.