🤖 AI Summary
Long-range navigation in unknown outdoor environments suffers from myopic decision-making due to limited local perception.
Method: This paper proposes an end-to-end goal-aligned frontier direction prediction framework. Unlike conventional approaches relying on long-range mapping or prior maps, it introduces traversable frontier direction as a distant semantic planning representation and learns a target-consistent directional mapping directly from monocular video. The method adopts a self-supervised paradigm, requiring only unlabeled first-person videos for training. It integrates visual affordance representation, direction alignment optimization, and a lightweight convolutional encoder, and is plug-and-play compatible with existing navigation stacks.
Results: Real-world evaluations on Boston Dynamics Spot and a large off-road vehicle demonstrate significantly reduced human intervention frequency and accelerated decision latency, validating its long-horizon robustness in prior-free scenarios and cross-platform practicality.
📝 Abstract
A robot navigating an outdoor environment with no prior knowledge of the space must rely on its local sensing to perceive its surroundings and plan. This can come in the form of a local metric map or local policy with some fixed horizon. Beyond that, there is a fog of unknown space marked with some fixed cost. A limited planning horizon can often result in myopic decisions leading the robot off course or worse, into very difficult terrain. Ideally, we would like the robot to have full knowledge that can be orders of magnitude larger than a local cost map. In practice, this is intractable due to sparse sensing information and often computationally expensive. In this work, we make a key observation that long-range navigation only necessitates identifying good frontier directions for planning instead of full map knowledge. To this end, we propose Long Range Navigator (LRN), that learns an intermediate affordance representation mapping high-dimensional camera images to `affordable' frontiers for planning, and then optimizing for maximum alignment with the desired goal. LRN notably is trained entirely on unlabeled ego-centric videos making it easy to scale and adapt to new platforms. Through extensive off-road experiments on Spot and a Big Vehicle, we find that augmenting existing navigation stacks with LRN reduces human interventions at test-time and leads to faster decision making indicating the relevance of LRN. https://personalrobotics.github.io/lrn