Learning to Drive Anywhere with Model-Based Reannotation11

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Visual navigation policies suffer from limited generalization due to the scarcity of high-quality annotated data. To address this, we propose Model-Driven Re-Annotation (MBRA), a novel framework that leverages large-scale unlabeled YouTube videos and crowdsourced teleoperation data. MBRA employs a short-horizon expert model to generate high-confidence action labels, which are then distilled into LogoNav—a long-horizon navigation policy. LogoNav integrates vision-action joint representation learning, GPS- and vision-conditioned goal modeling, and model-predictive control. Evaluated across six cities on three continents, LogoNav achieves robust indoor-outdoor navigation over 300 meters and enables stable quadruped locomotion in dense pedestrian environments, setting new state-of-the-art performance. This work introduces the MBRA paradigm for the first time, substantially reducing annotation cost while significantly enhancing scalability and generalization of visual navigation policies.

Technology Category

Application Category

📝 Abstract

Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. To overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled YouTube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments. Our extensive real-world evaluations, conducted across a fleet of robots (including quadrupeds) in six cities on three continents, validate the policy's ability to generalize and navigate effectively even amidst pedestrians in crowded settings.

Problem

Research questions and friction points this paper is trying to address.

Developing generalizable visual navigation policies for robots

Overcoming limited training data with passive sources

Enabling robust long-horizon navigation in unseen environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages crowd-sourced and YouTube data

Uses Model-Based ReAnnotation for action relabeling

Distills data into LogoNav for navigation

🔎 Similar Papers

No similar papers found.